©WILEY 


Advanced Calculus with 
Applications in Statistics 

Second Edition 



André I. Khuri 


WILEY SERIES IN PROBABILITY AND STATISTICS 



Advanced Calculus with 
Applications in Statistics 


Second Edition 
Revised and Expanded 


André I. Khuri 

University of Florida 
Gainesville, Florida 



WILEY- 

INTERSCIENCE 


A JOHN WILEY & SONS, INC., PUBLICATION 




Advanced Calculus with 
Applications in Statistics 

Second Edition 




Advanced Calculus with 
Applications in Statistics 


Second Edition 
Revised and Expanded 


André I. Khuri 

University of Florida 
Gainesville, Florida 



WILEY- 

INTERSCIENCE 


A JOHN WILEY & SONS, INC., PUBLICATION 




Copyright © 2003 by John Wiley & Sons, Inc. Ail rights reserved. 

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. 

Published simultaneously in Canada. 

No part of this publication may be reproduced, stored in a retrieval System or transmitted in any 
form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, 
except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without 
either the prior written permission of the Publisher, or authorization through payment of the 
appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, 
Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. 
Requests to the Publisher for permission should be addressed to the Permissions Department, 
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 
748-6008, e-mail: permreq@wiley.com. 

Limit of Liability/Disclaimer of Warranty: While the publisher and author hâve used their best 
efforts in preparing this book, they make no représentations or warranties with respect to 
the accuracy or completeness of the contents of this book and specifically disclaim any 
implied warranties of merchantability or fitness for a particular purpose. No warranty may be 
created or extended by sales représentatives or written sales materials. The advice and 
strategies contained herein may not be suitable for your situation. You should consult with 
a professional where appropriate. Neither the publisher nor author shall be liable for any 
loss of profit or any other commercial damages, including but not limited to spécial, 
incidental, consequential, or other damages. 

For general information on our other products and services please contact our Customer 
Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or 
fax 317-572-4002. 

Wiley also publishes its books in a variety of electronic formats. Some content that appears 
in print, however, may not be available in electronic format. 

Library of Congress Cataloging-in-Publication Data 

Khuri, André I., 1940- 

Advanced calculus with applications in statistics / André I. Khuri. - 2nd ed. rev. and 
expended. 

p. cm. — (Wiley sériés in probability and statistics) 

Includes bibliographical référencés and index. 

ISBN 0-471-39104-2 (cloth : alk. paper) 

1. Calculus. 2. Mathematical statistics. I. Title. IL Sériés. 

OA303.2.K48 2003 

515-dc21 2002068986 

Printed in the United States of America 


10 987654321 



To Ronnie, Marcus, and Roxanne 

and 

In memory of my sister Ninette 



Contents 


Préfacé 

Préfacé to the First Edition 

1. An Introduction to Set Theory 

1.1. The Concept of a Set, 1 

1.2. Set Operations, 2 

1.3. Relations and Functions, 4 

1.4. Finite, Countable, and Uncountable Sets, 6 

1.5. Bounded Sets, 9 

1.6. Some Basic Topological Concepts, 10 

1.7. Examples in Probability and Statistics, 13 
Further Reading and Annotated Bibliography, 15 
Exercises, 17 

2. Basic Concepts in Linear Algebra 

2.1. Vector Spaces and Subspaces, 21 

2.2. Linear Transformations, 25 

2.3. Matrices and Déterminants, 27 

2.3.1. Basic Operations on Matrices, 28 

2.3.2. The Rank of a Matrix, 33 

2.3.3. The Inverse of a Matrix, 34 

2.3.4. Generalized Inverse of a Matrix, 36 

2.3.5. Eigenvalues and Eigenvectors of a Matrix, 36 

2.3.6. Some Spécial Matrices, 38 

2.3.7. The Diagonalization of a Matrix, 38 

2.3.8. Quadratic Forms, 39 


XV 

xvii 

1 


21 


vil 



CONTENTS 


Vlll 

2.3.9. The Simultaneous Diagonalization 
of Matrices, 40 

2.3.10. Bounds on Eigenvalues, 41 

2.4. Applications of Matrices in Statistics, 43 

2.4.1. The Analysis of the Balanced Mixed Model, 43 

2.4.2. The Singular-Value Décomposition, 45 

2.4.3. Extrema of Quadratic Forms, 48 

2.4.4. The Parameterization of Orthogonal 
Matrices, 49 

Further Reading and Annotated Bibliography, 50 
Exercises, 53 


3. Limits and Continuity of Functions 57 

3.1. Limits of a Function, 57 

3.2. Some Properties Associated with Limits of Functions, 63 

3.3. The 0,0 Notation, 65 

3.4. Continuons Functions, 66 

3.4.1. Some Properties of Continuons Functions, 71 

3.4.2. Lipschitz Continuons Functions, 75 

3.5. Inverse Functions, 76 

3.6. Convex Functions, 79 

3.7. Continuons and Convex Functions in Statistics, 82 
Further Reading and Annotated Bibliography, 87 
Exercises, 88 

4. Différentiation 93 

4.1. The Dérivative of a Function, 93 

4.2. The Mean Value Theorem, 99 

4.3. Taylor’s Theorem, 108 

4.4. Maxima and Minima of a Function, 112 

4.4.1. A Sufficient Condition for a Local Optimum, 114 

4.5. Applications in Statistics, 115 

4.5.1. Functions of Random Variables, 116 

4.5.2. Approximating Response Functions, 121 

4.5.3. The Poisson Process, 122 

4.5.4. Minimizing the Sum of Absolute Déviations, 124 
Further Reading and Annotated Bibliography, 125 
Exercises, 127 



CONTENTS 


IX 


5. Infinité Sequences and Sériés 132 

5.1. Infinité Sequences, 132 

5.1.1. The Cauchy Criterion, 137 

5.2. Infinité Sériés, 140 

5.2.1. Tests of Convergence for Sériés 
of Positive Terms, 144 

5.2.2. Sériés of Positive and Négative Terms, 158 

5.2.3. Rearrangement of Sériés, 159 

5.2.4. Multiplication of Sériés, 162 

5.3. Sequences and Sériés of Functions, 165 

5.3.1. Properties of Uniformly Convergent Sequences 
and Sériés, 169 

5.4. Power Sériés, 174 

5.5. Sequences and Sériés of Matrices, 178 

5.6. Applications in Statistics, 182 

5.6.1. Moments of a Discrète Distribution, 182 

5.6.2. Moment and Probability Generating 
Functions, 186 

5.6.3. Some Limit Theorems, 191 

5.6.3. 1. The Weak Law of Large Numbers 
(Khinchine’s Theorem), 192 

5. 6. 3. 2. The Strong Law of Large Numbers 
(Kolmogorov’s Theorem), 192 

5. 6. 3. 3. The Continuity Theorem for Probability 
Generating Functions, 192 

5.6.4. Power Sériés and Logarithmic Sériés 
Distributions, 193 

5.6.5. Poisson Approximation to Power Sériés 
Distributions, 194 

5.6.6. A Ridge Régression Application, 195 
Further Reading and Annotated Bibliography, 197 
Exercises, 199 


6. Intégration 205 

6.1. Some Basic Définitions, 205 

6.2. The Existence of the Riemann Intégral, 206 

6.3. Some Classes of Functions That Are Riemann 
Intégrable, 210 

6.3.1. Functions of Bounded Variation, 212 



X 


CONTENTS 


6.4. Properties of the Riemann Intégral, 215 

6.4.1. Change of Variables in Riemann Intégration, 219 

6.5. Improper Riemann Intégrais, 220 

6.5.1. Improper Riemann Intégrais of the Second 
Kind, 225 

6.6. Convergence of a Sequence of Riemann Intégrais, 227 

6.7. Some Fundamental Inequalities, 229 

6.7.1. The Cauchy-Schwarz Inequality, 229 

6.7.2. Holder’s Inequality, 230 

6.7.3. MinkowskFs Inequality, 232 

6.7.4. Jensen’s Inequality, 233 

6.8. Riemann-Stieltjes Intégral, 234 

6.9. Applications in Statistics, 239 

6.9.1. The Existence of the First Négative Moment of a 
Continuons Distribution, 242 

6.9.2. Transformation of Continuons Random 
Variables, 246 

6.9.3. The Riemann-Stieltjes Représentation of the 
Expected Value, 249 

6.9.4. Chebyshev’s Inequality, 251 

Further Reading and Annotated Bibliography, 252 
Exercises, 253 

7. Multidimensional Calculus 261 

7.1. Some Basic Définitions, 261 

7.2. Limits of a Multivariable Function, 262 

7.3. Continuity of a Multivariable Function, 264 

7.4. Dérivatives of a Multivariable Function, 267 

7.4.1. The Total Dérivative, 270 

7.4.2. Directional Dérivatives, 273 

7.4.3. Différentiation of Composite Functions, 276 

7.5. Taylor’s Theorem for a Multivariable Function, 277 

7.6. Inverse and Implicit Function Theorems, 280 

7.7. Optima of a Multivariable Function, 283 

7.8. The Method of Lagrange Multipliers, 288 

7.9. The Riemann Intégral of a Multivariable Function, 293 

7.9.1. The Riemann Intégral on Cells, 294 

7.9.2. Iterated Riemann Intégrais on Cells, 295 

7.9.3. Intégration over General Sets, 297 

7.9.4. Change of Variables in /r-Tuple Riemann 
Intégrais, 299 



CONTENTS 


XI 


7.10. Différentiation under the Intégral Sign, 301 

7.11. Applications in Statistics, 304 

7.11.1. Transformations of Random Vectors, 305 

7.11.2. Maximum Likelihood Estimation, 308 

7.11.3. Comparison of Two Unbiased 
Estimators, 310 

7.11.4. Best Linear Unbiased Estimation, 311 

7.11.5. Optimal Choice of Sample Sizes in Stratified 
Sampling, 313 

Further Reading and Annotated Bibliography, 315 
Exercises, 316 


8. Optimization in Statistics 327 

8.1. The Gradient Methods, 329 

8.1.1. The Method of Steepest Descent, 329 

8.1.2. The Newton-Raphson Method, 331 

8.1.3. The Davidon-Fletcher-Powell Method, 331 

8.2. The Direct Search Methods, 332 

8.2.1. The Nelder-Mead Simplex Method, 332 

8.2.2. Price’s Controlled Random Search 
Procedure, 336 

8.2.3. The Generalized Simulated Annealing 
Method, 338 

8.3. Optimization Techniques in Response Surface 
Methodology, 339 

8.3.1. The Method of Steepest Ascent, 340 

8.3.2. The Method of Ridge Analysis, 343 

8.3.3. Modified Ridge Analysis, 350 

8.4. Response Surface Designs, 355 

8.4.1. First-Order Designs, 356 

8.4.2. Second-Order Designs, 358 

8.4.3. Variance and Bias Design Criteria, 359 

8.5. Alphabetic Optimality of Designs, 362 

8.6. Designs for Nonlinear Models, 367 

8.7. Multiresponse Optimization, 370 

8.8. Maximum Likelihood Estimation and the 
EM Algorithm, 372 

8.8.1. The EM Algorithm, 375 

8.9. Minimum Norm Quadratic Unbiased Estimation of 
Variance Components, 378 



CONTENTS 


Xll 


8.10. Scheffé’s Confidence Intervals, 382 

8.10.1. The Relation of Scheffé’s Confidence Intervals 
to the F-Test, 385 

Further Reading and Annotated Bibliography, 391 
Exercises, 395 

9. Approximation of Functions 403 

9.1. Weierstrass Approximation, 403 

9.2. Approximation by Polynomial Interpolation, 410 

9.2.1. The Accuracy of Lagrange Interpolation, 413 

9.2.2. A Combination of Interpolation and 
Approximation, 417 

9.3. Approximation by Spline Functions, 418 

9.3.1. Properties of Spline Functions, 418 

9.3.2. Error Bounds for Spline Approximation, 421 

9.4. Applications in Statistics, 422 

9.4.1. Approximate Linearization of Nonlinear Models 
by Lagrange Interpolation, 422 

9.4.2. Splines in Statistics, 428 

9.4.2. 1. The Use of Cubic Splines in 
Régression, 428 

9. 4.2.2. Designs for Fitting Spline Models, 430 

9. 4.2.3. Other Applications of Splines in 
Statistics, 431 

Further Reading and Annotated Bibliography, 432 
Exercises, 434 

10. Orthogonal Polynomials 437 

10.1. Introduction, 437 

10.2. Legendre Polynomials, 440 

10.2.1. Expansion of a Function Using Legendre 
Polynomials, 442 

10.3. Jacobi Polynomials, 443 

10.4. Chebyshev Polynomials, 444 

10.4.1. Chebyshev Polynomials of the First Kind, 444 

10.4.2. Chebyshev Polynomials of the Second Kind, 445 

10.5. Hermite Polynomials, 447 

10.6. Laguerre Polynomials, 451 

10.7. Least-Squares Approximation with Orthogonal 
Polynomials, 453 



CONTENTS 


10.8. Orthogonal Polynomials Defined on a Finite Set, 455 

10.9. Applications in Statistics, 456 

10.9.1. Applications of Hermite Polynomials, 456 

10.9.1.1. Approximation of Density Functions 
and Quantiles of Distributions, 456 

10.9.1.2. Approximation of a Normal 
Intégral, 460 

10.9.1.3. Estimation of Unknown 
Densities, 461 

10.9.2. Applications of Jacobi and Laguerre 
Polynomials, 462 

10.9.3. Calculation of Hypergeometric Probabilités 
Using Discrète Chebyshev Polynomials, 462 

Further Reading and Annotated Bibliography, 464 

Exercises, 466 

11. Fourier Sériés 

11.1. Introduction, 471 

11.2. Convergence of Fourier Sériés, 475 

11.3. Différentiation and Intégration of Fourier Sériés, 483 

11.4. The Fourier Intégral, 488 

11.5. Approximation of Functions by Trigonométrie 
Polynomials, 495 

11.5.1. Parseval’s Theorem, 496 

11.6. The Fourier Transform, 497 

11.6.1. Fourier Transform of a Convolution, 499 

11.7. Applications in Statistics, 500 

11.7.1. Applications in Time Sériés, 500 

11.7.2. Représentation of Probability Distributions, 501 

11.7.3. Régression Modeling, 504 

11.7.4. The Characteristic Function, 505 

11.7.4.1. Some Properties of Characteristic 

Functions, 510 

Further Reading and Annotated Bibliography, 510 
Exercises, 512 

12. Approximation of Intégrais 

12.1. The Trapezoidal Method, 517 

12.1.1. Accuracy of the Approximation, 518 

12.2. Simpson’s Method, 521 

12.3. Newton-Cotes Methods, 523 



XIV 


CONTENTS 


12.4. Gaussian Quadrature, 524 

12.5. Approximation over an Infinité Interval, 528 

12.6. The Method of Laplace, 531 

12.7. Multiple Intégrais, 533 

12.8. The Monte Carlo Method, 535 

12.8.1. Variation Réduction, 537 

12.8.2. Intégrais in Higher Dimensions, 540 

12.9. Applications in Statistics, 541 

12.9.1. The Gauss-Hermite Quadrature, 542 

12.9.2. Minimum Mean Squared Error 
Quadrature, 543 

12.9.3. Moments of a Ratio of Quadratic Forms, 546 

12.9.4. Laplace’s Approximation in Bayesian 
Statistics, 548 

12.9.5. Other Methods of Approximating Intégrais 
in Statistics, 549 

Further Reading and Annotated Bibliography, 550 
Exercises, 552 

Appendix. Solutions to Selected Exercises 557 

Chapter 1, 557 
Chapter 2, 560 
Chapter 3, 565 
Chapter 4, 570 
Chapter 5, 577 
Chapter 6, 590 
Chapter 7, 600 
Chapter 8, 613 
Chapter 9, 622 
Chapter 10, 627 
Chapter 11, 635 
Chapter 12, 644 

General Bibliography 652 


Index 


665 



Préfacé 


This édition provides a rather substantial addition to the material covered in 
the first édition. The principal différence is the inclusion of three new 
chapters, Chapters 10, 11, and 12, in addition to an appendix of solutions to 
exercises. 

Chapter 10 covers orthogonal polynomials, such as Legendre, Chebyshev, 
Jacobi, Laguerre, and Hermite polynomials, and discusses their applications 
in statistics. Chapter 11 provides a thorough coverage of Fourier sériés. The 
présentation is donc in such a way that a reader with no prior knowledge of 
Fourier sériés can hâve a clear understanding of the theory underlying the 
subject. Several applications of Fouries sériés in statistics are presented. 
Chapter 12 deals with approximation of Riemann intégrais. It gives an 
exposition of methods for approximating intégrais, including those that are 
multidimensional. Applications of some of these methods in statistics 
are discussed. This subject area has recently gained prominence in several 
fields of science and engineering, and, in particular, Bayesian statistics. The 
material should be helpful to readers who may be interested in pursuing 
further studies in this area. 

A significant addition is the inclusion of a major appendix that gives 
detailed solutions to the vast majority of the exercises in Chapters 1-12. This 
supplément was prepared in response to numerous suggestions by users of 
the first édition. The solutions should also be helpful in getting a better 
understanding of the varions topics covered in the book. 

In addition to the aforementioned material, several new exercises were 
added to some of the chapters in the first édition. Chapter 1 was expanded by 
the inclusion of some basic topological concepts. Chapter 9 was modified to 
accommodate Chapter 10. The changes in the remaining chapters, 2 through 
8, are very minor. The general bibliography was updated. 

The choice of the new chapters was motivated by the évolution of the field 
of statistics and the growing needs of statisticians for mathematical tools 
beyond the realm of advanced calculus. This is certainly true in topics 
concerning approximation of intégrais and distribution functions, stochastic 


XV 



XVI 


PREFACE 


processes, time sériés analysis, and the modeling of periodic response func- 
tions, to name just a few. 

The book is self-contained. It can be used as a text for a two-semester 
course in advanced calculas and introductory mathematical analysis. Chap- 
ters 1-7 may be covered in one semester, and Chapters 8-12 in the other 
semester. With its coverage of a wide variety of topics, the book can also 
serve as a reference for statisticians, and others, who need an adéquate 
knowledge of mathematics, but do not hâve the time to wade through the 
myriad mathematics books. It is hoped that the inclusion of a separate 
section on applications in statistics in every chapter will provide a good 
motivation for learning the material in the book. This represents a continua- 
tion of the practice followed in the first édition. 

As with the first édition, the book is intended as much for mathematicians 
as for statisticians. It can easily be turned into a pure mathematics book by 
simply omitting the section on applications in statistics in a given chapter. 
Mathematicians, however, may find the sections on applications in statistics 
to be quite useful, particularly to mathematics students seeking an interdisci- 
plinary major. Such a major is becoming increasingly popular in many circles. 
In addition, several topics are included here that are not usually found in a 
typical advanced calculas book, such as approximation of fonctions and 
intégrais, Fourier sériés, and orthogonal polynomials. The fields of mathe- 
matics and statistics are becoming increasingly intertwined, making any 
séparation of the two unpropitious. The book represents a manifestation of 
the interdependence of the two fields. 

The mathematics background needed for this édition is the same as for 
the first édition. For readers interested in statistical applications, a back- 
ground in introductory mathematical statistics will be helpful, but not abso- 
lutely essential. The annotated bibliography in each chapter can be consulted 
for additional readings. 

I am grateful to ail those who provided comments and helpful suggestions 
concerning the first édition, and to my wife Ronnie for her help and support. 

André I. Khuri 


Gainesville, Florida 



Préfacé to the First Edition 


The most remarkable mathematical achievement of the seventeenth century 
was the invention of calculus by Isaac Newton (1642-1727) and Gottfried 
Wilhelm Leibniz (1646-1716). It has since played a significant rôle in ail 
fields of science, serving as its principal quantitative language. There is hardly 
any scientific discipline that does not require a good knowledge of calculus. 
The field of statistics is no exception. 

Advanced calculus has had a fundamental and séminal rôle in the devel- 
opment of the basic theory underlying statistical methodology. With the rapid 
growth of statistics as a discipline, particularly in the last three décades, 
knowledge of advanced calculus has become impérative for understanding 
the recent advances in this field. Students as well as research workers in 
statistics are expected to hâve a certain level of mathematical sophistication 
in order to cope with the intricacies necessitated by the emerging of new 
statistical méthodologies. 

This book has two purposes. The first is to provide beginning graduate 
students in statistics with the basic concepts of advanced calculus. A high 
percentage of these students hâve undergraduate training in disciplines other 
than mathematics with only two or three introductory calculus courses. They 
are, in general, not adequately prepared to pursue an advanced graduate 
degree in statistics. This book is designed to fill the gaps in their mathemati- 
cal training and equip them with the advanced calculus tools needed in their 
graduate work. It can also provide the basic prerequisites for more advanced 
courses in mathematics. 

One salient feature of this book is the inclusion of a complété section in 
each chapter describing applications in statistics of the material given in the 
chapter. Furthermore, a large segment of Chapter 8 is devoted to the 
important problem of optimization in statistics. The purpose of these applica- 
tions is to help motivate the learning of advanced calculus by showing its 
relevance in the field of statistics. There are many advanced calculus books 
designed for engineers or business majors, but there are none for statistics 


xvii 



xviii PREFACE TO THE FIRST EDITION 

majors. This is the first advanced calculas book to emphasize applications in 
statistics. 

The scope of this book is not limited to serving the needs of statistics 
graduate students. Practicing statisticians can use it to sharpen their mathe- 
matical skills, or they may want to keep it as a handy reference for their 
research work. These individuals may be interested in the last three chapters, 
particularly Chapters 8 and 9, which include a large number of citations of 
statistical papers. 

The second purpose of the book concerns mathematics majors. The book’s 
thorough and rigorous coverage of advanced calculas makes it quite suitable 
as a text for juniors or seniors. Chapters 1 through 7 can be used for this 
purpose. The instructor may choose to omit the last section in each chapter, 
which pertains to statistical applications. Students may benefit, however, 
from the exposure to these additional applications. This is particularly true 
given that the trend today is to allow the undergraduate student to hâve a 
major in mathematics with a minor in some other discipline. In this respect, 
the book can be particularly useful to those mathematics students who may 
be interested in a minor in statistics. 

Other features of this book include a detailed coverage of optimization 
techniques and their applications in statistics (Chapter 8), and an introduc- 
tion to approximation theory (Chapter 9). In addition, an annotated bibliog- 
raphy is given at the end of each chapter. This bibliography can help direct 
the interested reader to other sources in mathematics and statistics that are 
relevant to the material in a given chapter. A general bibliography is 
provided at the end of the book. There are also many examples and exercises 
in mathematics and statistics in every chapter. The exercises are classified by 
discipline (mathematics and statistics) for the benefit of the student and the 
instructor. 

The reader is assumed to hâve a mathematical background that is usually 
obtained in the freshman-sophomore calculas sequence. A prerequisite for 
understanding the statistical applications in the book is an introductory 
statistics course. Obviously, those not interested in such applications need 
not worry about this prerequisite. Readers who do not hâve any background 
in statistics, but are nevertheless interested in the application sections, can 
make use of the annotated bibliography in each chapter for additional 
reading. 

The book contains nine chapters. Chapters 1-7 cover the main topics in 
advanced calculas, while chapters 8 and 9 include more specialized subject 
areas. More specifically, Chapter 1 introduces the basic éléments of set 
theory. Chapter 2 présents some fundamental concepts concerning vector 
spaces and matrix algebra. The purpose of this chapter is to facilitate the 
understanding of the material in the remaining chapters, particularly, in 
Chapters 7 and 8. Chapter 3 discusses the concepts of limits and continuity of 
functions. The notion of différentiation is studied in Chapter 4. Chapter 5 
covers the theory of infinité sequences and sériés. Intégration of functions is 



PREFACE TO THE FIRST EDITION 


XIX 


the theme of Chapter 6. Multidimensional calculas is introduced in Chapter 
7. This chapter provides an extension of the concepts of limits, continuity, 
différentiation, and intégration to functions of several variables (multivaria- 
ble functions). Chapter 8 consists of two parts. The first part présents an 
OverView of the varions methods of optimization of multivariable functions 
whose optima cannot be obtained explicitly by standard advanced calculas 
techniques. The second part discusses a variety of topics of interest to 
statisticians. The common theme among these topics is optimization. Finally, 
Chapter 9 deals with the problem of approximation of continuons functions 
with polynomial and spline functions. This chapter is of interest to both 
mathematicians and statisticians and contains a wide variety of applications 
in statistics. 

I am grateful to the University of Florida for granting me a sabbatical 
leave that made it possible for me to embark on the project of writing this 
book. I would also like to thank Professor Rocco Ballerini at the University 
of Florida for providing me with some of the exercises used in Chapters, 3, 4, 
5, and 6. 


André I. Khuri 


Gainesville, Florida 




CHAPTER 1 


An Introduction to Set Theory 


The origin of the modem theory of sets can be traced back to the Russian-born 
German mathematician Georg Cantor (1845-1918). This chapter introduces 
the basic éléments of this theory. 


1.1. THE CONCEPT OF A SET 

A set is any collection of well-defined and distinguishable objects. These 
objects are called the éléments, or members, of the set and are denoted by 
lowercase letters. Thus a set can be perceived as a collection of éléments 
United into a single entity. Georg Cantor stressed this in the following words: 
“A set is a multitude conceived of by us as a one.” 

If X is an element of a set A, then this fact is denoted by writing 
If, however, x is not an element of A, then we write x^A. Curly brackets 
are usually used to describe the contents of a set. For example, if a set A 
consists of the éléments X 2 , . . . , then it can be represented as A = 
{xi, % 2 , . . . , x„}. In the event membership in a set is determined by the 
satisfaction of a certain property or a relationship, then the description of the 
same can be given within the curly brackets. For example, if A consists of ail 
real numbers x such that x^ > 1, then it can be expressed as A = {x|x^ > 1}, 
where the bar | is used simply to mean “such that.” The définition of sets in 
this manner is based on the axiom of abstraction, which States that given any 
property, there exists a set whose éléments are just those entities having that 
property. 

Définition 1.1.1. The set that contains no éléments is called the empty set 
and is denoted by 0. □ 

Définition 1.1.2. A set A is a subset of another set B, written symboli- 
cally as A <z B, if every element of A is an element of 5. If 5 contains at 
least one element that is not in A, then A is said to be a proper subset of B. 

□ 


1 



2 


AN INTRODUCTION TO SET THEORY 


Définition 1.1.3. A set A and a set B are equal if yl and B<zA. 
Thus, every element of A is an element of B and vice versa. □ 

Définition 1.1.4. The set that contains ail sets under considération in a 
certain study is called the universal set and is denoted by 11. □ 


1.2. SET OPERATIONS 

There are two basic operations for sets that produce new sets from existing 
ones. They are the operations of union and intersection. 

Définition 1.2.1. The union of two sets A and B, denoted hy A VJ B, is 
the set of éléments that belong to either A or B, that is, 

AvjB = {x\x ^A or x^ B) . □ 

This définition can be extended to more than two sets. For example, if 
A-^,A 2 ,...,A^ are n given sets, then their union, denoted by U is a set 

such that X is an element of it if and only if x belongs to at least one of the 
Al (/ = 1 , 2 , . . . , n). 

Définition 1.2.2. The intersection of two sets A and B, denoted by 
A (J B the set of éléments that belong to both A and B. Thus 

AC\B = {x\x ^A and x^B) . □ 

This définition can also be extended to more than two sets. As before, if 
A^, ^ 2 , . . . , are n given sets, then their intersection, denoted by fl 
is the set consisting of ail éléments that belong to ail the Ai (/ = 1, 2, . . . , n). 

Définition 1.2.3. Two sets A and B are disjoint if their intersection is the 
empty set, that is, A n 5 = 0. □ 

Définition 1.2.4. The complément of a set A, denoted by A, is the set 
consisting of ail éléments in the universal set that do not belong to A. In 
other words, x^A ii and only if x^A. 

The complément of A with respect to a set B is the set B — A which 
consists of the éléments of B that do not belong to A. This complément is 
called the relative complément of A with respect to B. □ 

From Définitions 1.1. 1-1. 1.4 and 1.2. 1-1. 2.4, the following results can be 
concluded: 

Result 1.2.1. The empty set 0 is a subset of every set. To show this, 
suppose that A is any set. If it is false that 0 cyl, then there must be an 



SET OPERATIONS 


3 


élément in 0 which is not in A. But this is not possible, since 0 is empty. It 
is therefore true that 0 cyl. 

Result 1.2.2. The empty set 0 is unique. To prove this, suppose that 0^ 
and 02 are two empty sets. Then, by the previous resuit, 0^ c 02 and 
02 > 01* Hence, 0^ = 02* 

Result 1.2.3. The complément of 0 is H. Vice versa, the complément 
of n is 0. 

Result 1.2.4. The complément oi A A, 

Result 1.2.5. For any set A, AVJA = il and AnA = 0. 

Result 1.2.6. A-B=A-AC\B. 

Result 1.2.7. A vj {B U C) = {A vj B) vj C. 

Result 1.2.8. n (R n C) = (yl n 5) n C. 

Result 1.2.9. Avj{B (^C) = {AuB) (^{Avj C). 

Result 1.2.10. A(^{BvjC) = {A(^B)vj{A(^ C). 

Result 1.2.11. {A UB)=A n_6. More generally, Uf=i^/= 

Result 1.2.12. {A n_6) =A U R. More generally, fl ^=iA^= U ”=i^r 

Définition 1.2.5. Let A and B be two sets. Their Cartesian product, 
denoted by yl X_S, is the set of ail ordered pairs {a, h) such that a ^A and 
b ^ B, that is, 

AxB = {(^a,b)\a ^A and h ^B] . 


The Word “ordered” means that if a and c are éléments in A and b and d 
are éléments in B, then {a, b) = (c, d) if and only if a = c and b =d. □ 

The preceding définition can be extended to more than two sets. For 
example, if A^, A 2 , . . . , A^ are n given sets, then their Cartesian product is 
denoted by and defined by 


n 


X Ai = {{a^,a 2 ,...,a„)\ai^Ai, i = l,2,...,n] 


i = l 



4 


AN INTRODUCTION TO SET THEORY 


Here, « 2 ? • • • ? called an ordered n-tuple, represents a generaliza- 
tion of the ordered pair. In particular, if the are equal to A for 
/ = 1, 2, . . . , 7î, then one writes A^ for X yl. 

The following results can be easily verified: 

Result 1.2.13. AxB = 0 if and only if A = 0 or B = 0. 

Result 1.2.14. (AUB)XC = (AXC)U(BX C). 

Result 1.2.15. (A C\B)x C = (A X C) n(B X C). 

Result 1.2.16. (A X B) n(C X D) = (A n C) x(B n D). 


1.3. RELATIONS AND FUNCTIONS 

Let A X B be the Cartesian product of two sets, A and B. 

Définition 1.3.1. A relations p from to R is a subset of A X B, that is, 
P consists of ordered pairs {a, b) such that a ^A and b ^ B. In particular, if 
A=B, then p is said to be a relation in A. 

For example, if yl={7, 8, 9} and R = {7, 8, 9, 10}, then p = {(a,b)\a <b, 
a ^A, b ^B} is ^ relation from A to B that consists of the six ordered pairs 
(7, 8), (7, 9), (7, 10), (8, 9), (8, 10), and (9, 10). 

Whenever p is a relation and (x, y) e p, then x and y are said to be 
p-related. This is denoted by writing x p y. □ 

Définition 1.3.2. A relation p in a set A is an équivalence relation if the 
following properties are satisfied: 

1. p is reflexive, that is, apa for any a in A. 

2. p is symmetric, that is, if apb, then bpa for any a, b in A. 

3. p is transitive, that is, if apb and bpc, then apc for any a,b,c in A. 

If p is an équivalence relation in a set A, then for a given üq in A, the set 


C{üq) = [a ^A\üq pa], 

which consists of ail éléments of A that are p-related to is called an 
équivalence class of a □ 

Result 1.3.1. a e Cia) for any a in A. Thus each element of A is an 
element of an équivalence class. 



RELATIONS AND FUNCTIONS 


5 


Result 1.3.2. If C{a^) and C{a2) are two équivalence classes, then 
either C{a^) = C{a 2 \ or C{a^) and C(« 2 ) ^re disjoint subsets. 

It follows from Results 1.3.1 and 1.3.2 that if is a nonempty set, the 
collection of distinct p-equivalence classes of A forms a partition of A. 

As an example of an équivalence relation, consider that a pb ii and only if 
a and h are integers such that a— h divisible by a nonzero integer n. This 
is the relation of congruence modulo n in the set of integers and is written 
symbolically as a=b (mod n). Clearly, a=a (mod n), since a — a = 0 is 
divisible by n. Also, ii a = b (mod/r), then b=a (mod/r), since if a — b is 

divisible by n, then so is b— a, Furthermore, ii a =b (mod/r) and b = c 

(mod 7î), then a=c (mod n). This is true because ii a — b and b — c are both 
divisible by n, then so is {a —b) {b — c) = a — c. Now, if is a given 

integer, then a p-equivalence class of consists of ail integers that can be 

written as a = a^-\- kn, where k is an integer. This in this example C(«o) is 
the set {üQ + kn\k where J dénotés the set of ail integers. 

Définition 1.3.3. Let p be a relation from A to B. Suppose that p has 
the property that for ail x in A, ii xpy and xpz, where y and z are éléments 
in B, then y =z. Such a relation is called a fonction. □ 

Thus a fonction is a relation p such that any two éléments in B that are 
p-related to the same x in A must be identical. In other words, to each 
element x in A, there corresponds only one element y in B. We call y the 
value of the fonction at x and dénoté it by writing y =f(x). The set A is 
called the domain of the fonction /, and the set of ail values of f{x) for x in 
A is called the range of /, or the image of A under /, and is denoted by 
/(A). In this case, we say that / is a fonction, or a mapping, from A into B. 
We express this fact by writing /: A ^ B. Note that f{A) is a subset of B. In 
particular, if B =f{A), then / is said to be a fonction from A onto B. In this 
case, every element b in B has a corresponding element a in A such that 
b =f{a). 

Définition 1.3.4. A fonction / defined on a set A is said to be a 
one-to-one fonction if whenever f{xf) =f(x 2 ) for X 2 in A, one has 
Xi =X 2 ‘ Equivalently, / is a one-to-one fonction if whenever ^X 2 , one has 
/(Xi) □ 

Thus a fonction /: A^B is one-to-one if to each y in f(A), there 
corresponds only one element x in A such that y =/(x). In particular, if / is 
a one-to-one and onto fonction, then it is said to provide a one-to-one 
correspondence between A and B. In this case, the sets A and B are said to 
be équivalent. This fact is denoted by writing A ^ B. 

Note that whenever A'^B, there is a fonction g: B^A such that if 
y =fix), then x =g(y). The fonction g is called the inverse fonction of / and 



6 


AN INTRODUCTION TO SET THEORY 


is denoted by It is easy to see that A^B defines an équivalence 
relation. Properties 1 and 2 in Définition 1.3.2 are obviously true here. As for 
property 3, if A, B, and C are sets such that A ^ B and B ^ C, then A ^ C. 
To show this, let f: A ^ B and h: B ^ C be one-to-one and onto functions. 
Then, the composite function h° f, where h ° f(x) =h[f(x)], defines a one- 
to-one correspondence between A and C. 

Example 1.3.1. The relation apb, where a and b are real numbers such 
that a=b^, is not a function. This is true because both pairs (a, b) and 
(a, — b) belong to p. 

Example 1.3.2. The relation apb, where a and b are real numbers such 
that b = 2a^ 1, is a function, since for each a, there is only one b that is 

p-related to a. 

Example 1.3.3. Let A = {x\ — 1 < x < 1}, 5 = {x| 0 < x < 2}. Define 
f: A ^B such that /(x)=x^. Here, / is a function, but is not one-to-one 
because /(l) =/( — !) = 1. Also, / does not map A onto B, since y = 2 has no 
corresponding x in A such that x^ = 2. 

Example 1.3.4. Consider the relation xpy, where y = arcsinx, — 1< 
x< 1. Here, y is an angle measured in radians whose sine is x. Since there 
are infinitely many angles with the same sine, p is not a function. However, if 
we restrict the range of y to the set B = {y\ — tt/ 2 <y < tt/ 2}, then p 
becomes a function, which is also one-to-one and onto. This function is the 
inverse of the sine function x = sin y. We refer to the values of y that belong 
to the set B as the principal values of arcsin x, which we dénoté by writing 
y = Arcsin x. Note that other functions could hâve also been defined from 
the arcsine relation. For example, if 7 t/ 2 <y < 37 t/ 2, then x = sin y = — sin z, 
where z=y— tt. Since — tt/ 2 < z < tt/ 2, then z=— Arcsin x. Thus y = 
TT — Arcsin x maps the set A ={x\ — 1 <x < 1} in a one-to-one manner onto 
the set C = {y| 7r/2 <y < 3 t7/2}. 


1.4. FINITE, COUNTABLE, AND UNCOUNTABLE SETS 

Let /„ = {1, 2, . . . , n} be a set consisting of the first n positive integers, and let 
dénoté the set of ail positive integers. 

Définition 1.4.1, A set A is said to be: 


1 . 

2 . 


Finite if A for some positive integer n. 

Countable if A In this case, the set or any other set équiva- 
lent to it, can be used as an index set for A, that is, the éléments of A 
are assigned distinct indices (subscripts) that belong to Hence, 
A can be represented as A = « 2 , . . . , 


» » » 



FINITE, COUNTABLE, AND UNCOUNTABLE SETS 


7 


3. Uncountable if A is neither finite nor countable. In this case, the 
éléments of A cannot be indexed by for any n, or by □ 

Example 1.4.1. Let A = {1, 4, 9, . . . , . . . }. This set is countable, since 

the function /: ^A defined by fin) = n^ is one-to-one and onto. Hence, 

A 


Example 1.4.2. Let A=Jb& the set of ail integers. Then A is count- 
able. To show this, consider the function /: ^ A defined by 


fin) 


(n + 1) /2, n odd, 
(2 — 7 î)/2, /î even. 


It can be verified that / is one-to-one and onto. Hence, A 


Example 1.4.3. Let = {x|0 <x < 1}. This set is uncountable. To show 
this, suppose that there exists a one-to-one correspondence between and 
A. We can then write A = ^ 2 , . . . , Let the digit in the nXh décimal 

place of be denoted by (n = 1, 2, . . . ). Define a number c as c = 0'C^C2 

••• ••• such that for each n, = 1 if # 1 and = 2 if b^ = 1. Now, c 

belongs to A, since 0 < c < 1. However, by construction, c is different from 
every a- in at least one décimal digit (/ = 1, 2, . . . ) and hence c ^ A, which is a 
contradiction. Therefore, A is not countable. Since A is not finite either, 
then it must be uncountable. 

This resuit implies that any subset of R, the set of real numbers, that 
contains yl, or is équivalent to it, must be uncountable. In particular, R is 
uncountable. 


Theorem 1.4.1. Every infinité subset of a countable set is countable. 

Proof Let be a countable set, and B be an infinité subset of A. Then 
A={a^,a 2 ,...,a^,...}, where the a/s are distinct éléments. Let n-^ be the 
smallest positive integer such that Let U 2 > be the next smallest 

integer such that In general, if 7 î^<7î 2< "* hâve been 

chosen, let be the smallest integer greater than such that 
Define the function /: ^B such that fik) = , Æ = 1, 2, . . . . This func- 

tion is one-to-one and onto. Hence, B is countable. □ 

Theorem 1.4.2. The union of two countable sets is countable. 


Proof Let A and B be countable sets. Then they can be represented as 
= {« 1 , « 2 ? • • • ? ^ = {^ 1 ? ^ 2 ? • • • 5 • • • }• Define C=AVJB. Consider 

the following two cases: 

i. A and B are disjoint. 

ii. A and B are not disjoint. 



8 


AN INTRODUCTION TO SET THEORY 


In case i, let us write C as C = « 2 , ^ 2 ? • • • ? Consider the 

function /: ^ C such that 

^(n + l)/2? /î Odd, 

K/ 2 ^ ^ even. 

It can be verified that / is one-to-one and onto. Hence, C is countable. 

Let us now consider case ii. If A(^B^ 0, then some éléments of C, 
namely those in A(^B, will appear twice. Hence, there exists a set E 
such that E ^ C. Thus C is either finite or countable. Since C ^A and A is 
infinité, C must be countable. □ 

Corollary 1.4.1. If yl 2 , . . . , , are countable sets, then Ur=i^i 

is countable. 

Proof. The proof is left as an exercise. □ 

Theorem 1.4.3. Let A and B be two countable sets. Then their Cartesian 
product AxB h countable. 

Proof. Let us write A as « 2 ? • • • ? • • • }• For a given a^A, 

define {a. B) as the set 



{a, B) = {{a,b)\b ^B] . 

Then (a. B) ^ B and hence (a. B) is countable. 
However, 


00 

AxB= \J {a B). 

i = l 

Thus by Corollary 1.4.1, A X B is countable. n 

Corollary 1.4.2. If yl 2 , . . . , ^„ are countable sets, then their Carte- 
sian product X^^^A- is countable. 

Proof The proof is left as an exercise. □ 

Corollary 1.4.3. The set Q of ail rational numbers is countable. 

Proof By définition, a rational number is a number of the form m/n, 
where m and n are integers with n ^ 0. Thus Q ^ Q, where 


Q = {{m,n)\m,n are integers and n ^ 0} . 



BOUNDED SETS 


9 


Since Q is an infinité subset of / X /, where J is the set of ail integers, which 
is countable as was seen in Example 1.4.2, then by Theorems 1.4.1 and 1.4.3, 
Q is countable and so is g. □ 

Remark 1.4.1. Any real number that cannot be expressed as a rational 
number is called an irrational number. For example, is an irrational 
number. To show this, suppose that there exist integers, m and n, such that 
= m/n. We may consider that m/n is written in its lowest terms, that is, 
m and n hâve no common factors other than unity. In particular, m and n, 
cannot both be even. Now, This implies that is even. Hence, m 

is even and can therefore be written as m = 2m \ It follows that = m^/2 = 
Consequently, and hence n, is even. This contradicts the fact that 
m and n are not both even. Thus ^|2 must be an irrational number. 


1.5. BOUNDED SETS 

Let us consider the set R of real numbers. 

Définition 1.5.1, A set A <zR is said to be: 

1. Bounded from above if there exists a number q such that x<q for ail 
X in A. This number is called an upper bound of A. 

2. Bounded from below if there exists a number p such that x>p for ail 
X in A. The number p is called a lower bound of A. 

3. Bounded if A has an upper bound q and a lower bound p. In this case, 

there exists a nonnegative number r such that —r<x<r for ail x in 
A. This number is equal to max(|/?|,|^|). □ 

Définition 1.5.2. Let A <zR be a set bounded from above. If there exists 
a number / that is an upper bound of A and is less than or equal to any 
other upper bound of A, then / is called the least upper bound of A and is 
denoted by lub(A). Another name for lub(yl) is the supremum of A and is 
denoted by sup(yl). □ 

Définition 1.5.3. Let A <z R be a set bounded from below. If there exists 
a number g that is a lower bound of A and is greater than or equal to any 
other lower bound of A, then g is called the greatest lower bound and is 
denoted by glb(A). The infimum of A, denoted by inf(A), is another name 
for glb(yl). □ 

The least upper bound of A, if it exists, is unique, but it may or may not 
belong to A. The same is true for glb(A). The proof of the following theorem 
is omitted and can be found in Rudin (1964, Theorem 1.36). 



10 


AN INTRODUCTION TO SET THEORY 


Theorem 1.5.1. Let c be a nonempty set. 

1. If A is bounded from above, then lub(yl) exists. 

2. If A is bounded from below, then glb(yl) exists. 

Example 1.5. 1. Let yl = {x|x<0}. Then lub(yl) = 0, which does not 
belong to A. 

Example 1.5.2. Let = {I/tî|/î = 1,2, . . . }. Then lub(yl) = I and glb(yl) 
= 0. In this case, lub(yl) belongs to A, but glb(yl) does not. 


1.6. SOME BASIC TOPOLOGICAL CONCEPTS 

The field of topology is an abstract study that evolved as an independent 
discipline in response to certain problems in classical analysis and geometry. 
It provides a unifying theory that can be used in many diverse branches of 
mathematics. In this section, we présent a brief account of some basic 
définitions and results in the so-called point-set topology. 

Définition 1.6.1. Let A be a set, and let {B^} be a family of subsets 
of A. Then ^is a topology in A if it satisfies the following properties: 

1. The union of any number of members of ^ is also a member of ^ 

2. The intersection of a finite number of members of ^ is also a member 
of ^ 

3. Both A and the empty set 0 are members of ^ □ 

Définition 1.6.2. Let ^ be a topology in a set A. Then the pair (A,^) is 
called a topological space. □ 

Définition 1.6.3. Let (A,^) be a topological space. Then the members of 
^ are called the open sets of the topology □ 

Définition 1.6.4. Let {A,^) be a topological space. A neighborhood of a 
point /? eA is any open set (that is, a member of that contains p. In 
particular, if A=R, the set of real numbers, then a neighborhood oi p 
is an open set of the form Nfp) = {q\ \q~p\ < r} for some r > 0. □ 

Définition 1.6.5. Let (A,^) be a topological space. A family G = {5^} 
is called a basis for ^ if each open set (that is, member of is the union of 
members of G. □ 


On the basis of this définition, it is easy to prove the following theorem. 



SOME BASIC TOPOLOGICAL CONCEPTS 


11 


Theorem 1.6.1. Let (A,^) be a topological space, and let G be a basis 
for ^ Then a set B oA is open (that is, a member of if and only if for 
each P there is a G e G such that p <zB. 

For example, if A=R, then G = {NXp)\p r>0} is a basis for the 
topology in It follows that a set B <z R is open if for every point p in B, 
there exists a neighborhood Nj.{p) such that N^(p) 

Définition 1.6.6. Let {A,^) be a topological space. A set B cA is closed 
if B, the complément of B with respect to A, is an open set. □ 

It is easy to show that closed sets of a topological space (A,^) satisfy the 
following properties: 

1. The intersection of any number of closed sets is closed. 

2. The union of a finite number of closed sets is closed. 

3. Both A and the empty set 0 are closed. 

Définition 1.6.7. Let (A,^) be a topological space. A point p ^A is said 
to be a limit point of a set B <z A if every neighborhood of p contains at least 
one element of B distinct from p. Thus, if U{p) is any neighborhood of p, 
then U{p) (^B is a nonempty set that contains at least one element besides 
p. In particular, ii A = R, the set of real numbers, then /? is a limit point of a 
set B R if for any r > 0, Nfp)n[B — {p}]i^ 0, where {p} dénotés a set 
consisting of just p. □ 

Theorem 1.6.2. Let /? be a limit point of a set B<zR. Then every 
neighborhood of p contains infinitely many points of B. 

Proof The proof is left to the reader. □ 

The next theorem is a fundamental theorem in set theory. It is originally 
due to Bernhard Bolzano (I78I-I848), though its importance was first 
recognized by Karl Weierstrass (I8I5-I897). The proof is omitted and can be 
found, for example, in Zaring (1967, Theorem 4.62). 

Theorem 1.6.3 (Bolzano-Weierstrass). Every bounded infinité subset of 
R, the set of real numbers, has at least one limit point. 

Note that a limit point of a set B may not belong to B. For example, the 
set 5 = {I/tîItî = 1,2, . . .} has a limit point equal to zéro, which does not 
belong to B. It can be seen here that any neighborhood of 0 contains 
infinitely many points of B. In particular, if r is a given positive number, then 
ail éléments of B of the form 1/n, where n>l/r, belong to N/fS). From 
Theorem 1.6.2 it can also be concluded that a finite set cannot hâve limit 
points. 



12 


AN INTRODUCTION TO SET THEORY 


Limit points can be used to describe closed sets, as can be seen from the 
following theorem. 

Theorem 1.6.4. A set B is closed if and only if every limit point of B 
belongs to B. 

Proof Suppose that B is closed. Let /? be a limit point of B. If p ^B, 
then p^B, which is open. Hence, there exists a neighborhood U{p) of p 
contained inside B by Theorem 1.6.1. This means that Z7(/?) Pi 5 = 0, a 
contradiction, since /? is a limit point of B (see Définition 1.6.7). Therefore, 
P must belong to B. Vice versa, if every limit point of B is in B, then B must 
be closed. To show this, let p be any point in B. Then, p is not a limit point 
of B. Therefore, there exists a neighborhood U{p) such that !/(/?) c 5. This 
means that B is open and hence B is closed. □ 

It should be noted that a set does not hâve to be either open or closed; if 
it is closed, it does not hâve to be open, and vice versa. Also, a set may be 
both open and closed. 

Example 1.6.1. _S = {x|0<x<l} is an open subset of R, but is not 
closed, since both 0 and 1 are limit points of B, but do not belong to it. 

Example 1.6.2. _S = {x|0<x< 1} is closed, but is not open, since any 
neighborhood of 0 or 1 is not contained in B. 

Example 1.6.3. 5 = {x| 0 <x < 1} is not open, because any neighborhood 

of 1 is not contained in B. It is also not closed, because 0 is a limit point that 
does not belong to B. 

Example 1.6.4. The set R is both open and closed. 

Example 1.6.5. A finite set is closed because it has no limit points, but is 
obviously not open. 

Définition 1.6.8. A subset 5 of a topological space (A,^) is disconnected 
if there exist open subsets C and D oi A such that D C and 5 Pi Z) are 
disjoint nonempty sets whose union is B. A set is connected if it is not 
disconnected. □ 

The set of ail rationals Q is disconnected, since {x\x>^^2}C^Q and 
{x\x <}/2} C\Q are disjoint nonempty sets whose union is Q. On the other 
hand, ail intervals in R (open, closed, or half-open) are connected. 

Définition 1.6.9. A collection of sets {5^} is said to be a covering of a set 
A if the union [J^^a contains A. If each B^ is an open set, then {5^} is 
called an open covering. 



EXAMPLES IN PROBABILITY AND STATISTICS 


13 


Définition 1.6.10. A set A in a topological space is compact if each open 
covering {5^} of A has a finite subcovering, that is, there is a finite 
subcollection , B^^ of {5^} such that A c U . □ 

The concept of compactness is motivated by the classical Heine-Bord 
theorem, which characterizes compact sets in R, the set of real numbers, as 
closed and bounded sets. 

Theorem 1.6.5 (Heine-Borel). A set B <zR is compact if and only if it is 
closed and bounded. 

Proof See, for example, Zaring (1967, Theorem 4.78). □ 

Thus, according to the Heine-Borel theorem, every closed and bounded 
interval [a, h] is compact. 


1.7. EXAMPLES IN PROBABILITY AND STATISTICS 

Example 1.7.1. In probability theory, events are considered as subsets in 
a sample space H, which consists of ail the possible outcomes of an experi- 
ment. A Borel field of events (also called a a--field) in fl is a collection ^ of 
events with the following properties: 

i. fle^. 

ii. If E then E where E is the complément of E. 

iii. If Ep £" 2 , . is a countable collection of events in then 

U7=i^i belongs to 

The probability of an event £ is a number denoted by £(£) that has the 
following properties: 

i. 0 <£(£)< 1. 

ii. P(fl) = l. 

iii. If £^, £ 2 , . is a countable collection of disjoint events in 

then 


00 


P 


i = l 


00 


\JE, =T^P{E,). 


i = \ 


By définition, the triple (II, P) is called a probability space. 

Example 1.7.2 . A random variable X defined on a probability space 
(H, P) is a function X\ fl^A, where is a nonempty set of real 
numbers. For any real number x, the set E = {o)^fl\Xiù)) <x} is an 



14 


AN INTRODUCTION TO SET THEORY 


élément of The probability of the event E is called the cumulative 
distribution function of X and is denoted by F{x). In statistics, it is custom- 
ary to write just X instead of X(w). We thus hâve 


F{x) =P{X<x). 

This concept can be extended to several random variables: Let X 2 , . . . , 
be n random variables. Define the event A^ = {co^ fl\X^(ù)) <Xi), i = 
1,2, . . . , 7î. Then, P(n which can be expressed as 

F{x^,X2,...,x„) =P(Xi <Xi,X2<X2,...,X„ <x„), 


is called the joint cumulative distribution function of X^, X 2 , . . . , X„. In this 
case, the n-tuple (X^, X 2 , . . . , X„) is said to hâve a multivariate distribution. 

A random variable X is said to be discrète, or to hâve a discrète 
distribution, if its range is finite or countable. For example, the binomial 
random variable is discrète. It represents the number of successes in a 
sequence of n independent trials, in each of which there are two possible 
outcomes: success or failure. The probability of success, denoted by /?„, is the 
same in ail the trials. Such a sequence of trials is called a Bernoulli sequence. 
Thus the possible values of this random variable are 0, 1, . . . , n. 

Another example of a discrète random variable is the Poisson, whose 
possible values are 0, 1,2, . . . . It is considered to be the limit of a binomial 
random variable as n ^ 00 in such a way that ^ A > 0. Other examples of 
discrète random variables include the discrète uniform, géométrie, hypergeo- 
metric, and négative binomial (see, for example, Fisz, 1963; Johnson and 
Kotz, 1969; Lindgren 1976; Lloyd, 1980). 

A random variable X is said to be continuons, or to hâve a continuons 
distribution, if its range is an uncountable set, for example, an interval. In 
this case, the cumulative distribution function F(x) of X is a continuons 
function of x on the set R of ail real numbers. If, in addition, F(x) is 
différentiable, then its dérivative is called the density function of X. One of 
the best-known continuons distributions is the normal. A number of continu- 
ons distributions are derived in connection with it, for example, the chi- 
squared, F, Rayleigh, and t distributions. Other well-known continuons 
distributions include the beta, continuons uniform, exponential, and gamma 
distributions (see, for example, Fisz, 1963; Johnson and Kotz, 1970a, b). 


Example 1.7.3. Let /(x, 0) dénoté the density function of a continuons 
random variable X, where 0 represents a set of unknown parameters that 
identify the distribution of X. The range of X, which consists of ail possible 
values of X, is referred to as a population and denoted by Any subset of 
n éléments from forms a sample of size n. This sample is actually an 
element in the Cartesian product P^. Any real-valued function defined on 
P^ is called a statistic. We dénoté such a function by g(X^ X 2 , . . . , X„), 
where each X- has the same distribution as X. Note that this function is a 
random variable whose values do not dépend on 0. For example, the sample 
mean X=E”=iX^/7î and the sample variance 5^ = E”=i(X, — X)^(/î — 1) 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


15 


are statistics. We adopt the convention that whenever a particular sample of 
size n is chosen (or observed) from Px, the éléments in that sample are 
written using lowercase letters, for example, % 2 , . . . , The correspond- 
ing value of a statistic is written as gix^, X 2 , . . . , x„). 

Example 1.7.4. Two random variables, X and T, are said to be equal in 
distribution if they hâve the same cumulative distribution function. This fact 
is denoted by writing X = T. The same définition applies to random variables 
with multivariate distributions. We note that = is an équivalence relation, 
since it satisfies properties 1, 2, and 3 in Définition 1.3.2. The first two 
pr^erties are obviously true. As for property 3, if X=Y and Y = Z, then 
X = Z, which implies that ail three random variables hâve the same cumula- 
tive distribution function. This équivalence relation is useful in nonparamet- 
ric statistics (see Randles and Wolfe, 1979). For example, it can be shown 
that if X has a distribution that is symmetric about some number /x, then 
X — fl = fjL—X. Also, if Xp X 2 , . . . , X„ are independent and identically dis- 
tributed random variables, and if ni 2 , . . . , m^) is any permutation of the 
Tî-tuple (1, 2, ... , 7î), then (X^, X 2 , . . . , X^) = (X^^, X^^, . . . , X^X In this case, 
we say that the collection of random variables X^,X 2 ,...,X„ is exchange- 
able. 

Example 1.7.5. Consider the problem of testing the null hypothesis Hq\ 
0< 6q versus the alternative hypothesis 0> Oq, where 0 is some un- 
known parameter that belongs to a set A. Let T be a statistic used in making 
a decision as to whether Hq should be rejected or not. This statistic is 
appropriately called a test statistic. 

Suppose that Hq is rejected if T> t, where t is some real number. Since 
the distribution of T dépends on 9, then the probability P{T > 0 is a 
function of 0, which we dénoté by 7r{0). Thus tt: A ^[0,1]. Let Bq be a 
subset of A defined as Bq = {0^A\ 9 < By définition, the size of the test 
is the least upper bound of the set 7t{Bq). This probability is denoted by a 
and is also called the level of significance of the test. We thus hâve 

a = sup 7t{9). 

e<e^ 

To learn more about the above examples and others, the interested reader 
may consider Consulting some of the references listed in the annotated 
bibliography. 


FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

Bronshtein, I. N., and K. A. Semendyayev (1985). Handbook of Mathematics (English 
translation edited by K. A. Hirsch). Van Nostrand Reinhold, New York. (Section 
4.1 in this book gives basic concepts of set theory; Chap. 5 provides a brief 
introduetion to probability and mathematical statistics.) 



16 


AN INTRODUCTION TO SET THEORY 


Dugundji, J. (1966). Topology. Allyn and Bacon, Boston. (Chap. 1 deals with elemen- 
tary set theory; Chap. 3 présents some basic topological concepts that complé- 
ments the material given in Section 1.6.) 

Fisz, M. (1963). Probability Theory and Mathematical Statistics, 3rd ed. Wiley, New 
York. (Chap. 1 discusses random events and axioms of the theory of probability; 
Chap. 2 introduces the concept of a random variable; Chap. 5 investigates some 
probability distributions.) 

Hardy, G. H. (1955). A Course of Pure Mathematics, lOth ed. The University Press, 
Cambridge, England. (Chap. 1 in this classic book is recommended reading for 
understanding the real number System.) 

Harris, B. (1966). Theory of Probability. Addison-Wesley, Reading, Massachusetts. 
(Chaps. 2 and 3 discuss some elementary concepts in probability theory as well as 
in distribution theory. Many exercises are provided.) 

Hogg, R. V., and A. T. Craig (1965). Introduction to Mathematical Statistics, 2nd ed. 
Macmillan, New York. (Chap. 1 is an introduction to distribution theory; exam- 
ples of some spécial distributions are given in Chap. 3; Chap. 10 considers some 
aspects of hypothesis testing that pertain to Example 1.7.5.) 

Johnson, N. L., and S. Kotz (1969). Discrète Distributions. Houghton Mifflin, Boston. 
(This is the first volume in a sériés of books on statistical distributions. It is an 
excellent source for getting detailed accounts of the properties and uses of these 
distributions. This volume deals with discrète distributions, including the bino- 
mial in Chap. 3, the Poisson in Chap. 4, the négative binomial in Chap. 5, and the 
hypergeometric in Chap. 6.) 

Johnson, N. L., and S. Kotz (1970a). Continuons Univariate Distributions — 1. Houghton 
Mifflin, Boston. (This volume covers continuous distributions, including the nor- 
mal in Chap. 13, lognormal in Chap. 14, Cauchy in Chap. 16, gamma in Chap. 17, 
and the exponential in Chap. 18.) 

Johnson, N. L., and S. Kotz (1970b). Continuous Univariate Distributions — 2. Houghton 
Mifflin, Boston. (This is a continuation of Vol. 2 on continuous distributions. 
Chaps. 24, 25, 26, and 27 discuss the beta, continuous uniforms, F, and t 
distributions, respectively.) 

Johnson, P. E. (1972). A History of Set Theory. Prindle, Weber, and Schmidt, Boston. 
(This book présents a historical account of set theory as was developed by Georg 
Cantor.) 

Lindgren, B. W. (1976). Statistical Theory, 3rd ed. Macmillan, New York. (Sections 1.1, 
1.2, 2.1, 3.1, 3.2, and 3.3 présent introductory material on probability models and 
distributions; Chap. 6 discusses test of hypothesis and statistical inference.) 

Lloyd, E. (1980). Handbook of Applicable Mathematics, Vol. II. Wiley, New York. 
(This is the second volume in a sériés of six volumes designed as texts of 
mathematics for professionals. Chaps. 1, 2, and 3 présent expository material on 
probability; Chaps. 4 and 5 discuss random variables and their distributions.) 

Randles, R. H., and D. A. Wolfe (1979). Introduction to the Theory of Nonparametric 
Statistics. Wiley, New York. (Section 1.3 in this book discusses the “equal in 
distribution” property mentioned in Example 1.7.4.) 

Rudin, W. (1964). Principles of Mathematical Analysis, 2nd ed. McGraw-Hill, New 
York. (Chap. 1 discusses the real number System; Chap. 2 deals with countable, 
uncountable, and bounded sets and pertains to Sections 1.4, 1.5, and 1.6.) 



EXERCISES 


17 


Stoll, R. R. (1963). Set Theory and Logic. W. H. Freeman, San Francisco. (Chap. 1 is 
an introduction to set theory; Chap. 2 discusses countable sets; Chap. 3 is useful 
in understanding the real number System.) 

Tucker, H. G. (1962). Probability and Mathematical Statistics. Academie Press, New 
York. (Chaps. 1, 3, 4, and 6 discuss basic concepts in elementary probability and 
distribution theory.) 

Vilenkin, N. Y. (1968). Stories about Sets. Academie Press, New York. (This is an 
interesting book that présents varions notions of set theory in an informai and 
delightful way. It contains many unusual stories and examples that make the 
learning of set theory rather enjoyable.) 

Zaring, W. M. (1967). An Introduction to Analysis. Macmillan, New York. (Chap. 2 
gives an introduction to set theory; Chap. 3 discusses functions and relations.) 


EXERCISES 
In Mathematics 

1.1. Verify Results 1.2.3-1.2.12. 

1.2. Verify Results 1.2.13-1.2.16. 

1.3. Let A, B, and C be sets such that A(^B<zC and Au C <^B. Show 
that A and C are disjoint. 

1.4. Let A, B, and C be sets such that C = {A—B)u{B — A). The set C is 
called the symmetric différence of A and B and is denoted by A^B. 
Show that 

(a) yl A 5 =A U B — A Pi B 

(b) A is{B ù. D) = {A is B) A D, where D is any set. 

(c) Au{BaD) = {AuB)a{Au Z>), where D is any set. 

1.5. Let A=J^ A J , where J is the set of positive integers. Define a 

relation p in as follows: If and (m 2 , 1 X 2 ) are éléments in A, 

then (m^ p(xn 2 , ^ 2 ) xn^n 2 = n^ni 2 ‘ Show that p is an équivalence 
relation and describe its équivalence classes. 

1.6. Let A be the same set as in Exercise 1.5. Show that the following 

relation is an équivalence relation: p{m 2 ,n 2 ) if m^-\- ri 2 = 

+ m. 2 ^ Draw the équivalence class of (1,2). 

1.7. Consider the set = {( — 2, — 5), (— 1,— 3), (1, 2), (3, 10)}. Show that A 
defines a function. 

1.8. Let A and B be two sets and / be a function defined on A such that 
f{A) (Z B. If A^, A 2 , . . . , are subsets of A, then show that: 



18 


AN INTRODUCTION TO SET THEORY 


(b) /(nr=i^,)cnr=i/u,x 

Under what conditions are the two sides in (b) equal? 


1 . 9 . Prove Corollary 1.4.1. 

1 . 10 . Prove Corollary 1.4.2. 

1 . 11 . Show that the set A = { 3 , 9, 19, 33 , 51, 73 , . . . } is countable. 

1 . 12 . Show that is an irrational number. 

1 . 13 . Let a, b, c, and d be rational numbers such that a }/b = c+ }/d . 
Then, either 

(a) a = c,b= d, or 

(b) b and d are both squares of rational numbers. 

1 . 14 . Let c be a nonempty set bounded from below. Define —A to be 
the set {—x\x ^A}. Show that mî(A) = — sup(— yl). 

1 . 15 . Let ci^ be a closed and bounded set, and let sup(yl) = b. Show that 
b ^A, 

1 . 16 . Prove Theorem 1.6.2. 

1 . 17 . Let (A,^) be a topological space. Show that G is a basis for ^ in 
and only if for each B and each p ^ B, there is a e G such that 

P e UdB. 

1 . 18 . Show that if A and B are closed sets, then Au B is a closed set. 

1 . 19 . Let B ciA be a closed subset of a compact set A. Show that B is 
compact. 

1 . 20 . Is a compact subset of a compact set necessarily closed? 


In Statistics 


1 . 21 . Let X be a random variable. Consider the following events: 

yl„ = {ca^ll|X(cü)<x + 3 /î = l,2,..., 

= { w e n\X{ w) <x - 3“"} , = 1, 2, . . . , 

A = {wen|x(w) <x}, 

5 = {wen|x(w) <x}. 


» » » 



EXERCISES 


19 


where x is a real number. Show that for any x, 

(a) n:^iA„=A; 

(b) 

1.22. Let X be a nonnegative random variable such that E(X) = ja is finite, 
where E(X) dénotés the expected value of X. The following inequal- 
ity, known as Markov's inequality, is true: 

P{X>h)<y, 

n 

where h is any positive number. Consider now a Poisson random 
variable with parameter A. 

(a) Find an upper bound on the probability P(X > 2) using Markov’s 
inequality. 

(b) Obtain the exact probability value in (a), and demonstrate that it is 
smaller than the corresponding upper bound in Markov’s inequal- 
ity. 

1.23. Let X be a random variable whose expected value /x and variance 
exist. Show that for any positive constants c and Æ, 

(a) P(|X— fjb\ >c)< (T^/c^, 

(b) P(|X-/x| >ka)<l/k^, 

(c) P(|X-/x| <ka)>l-l/k^. 

The preceding three inequalities are équivalent versions of the so-called 
Chehyshev's inequality. 


1.24. Let X be a continuons random variable with the density function 



— 1 <x < 1, 

elsewhere . 


By définition, the density function of X is a nonnegative function such 
that P(x) = where P(x) is the cumulative distribution func- 

tion of X. 

(a) Apply Markov’s inequality to finding upper bounds on the following 
probabilities: (i) P(|X| > §); (ii) P(|X| > |). 

(b) Compute the exact value of P(|X| > |), and compare it against the 
upper bound in (a)(i). 


1.25. Let X^,X 2 ,...,X„ be n continuons random variables. Define the 
random variables X^^j and X^„^ as 

= min 

l<i<n 

X,n)= max 

l<i<n 



20 


AN INTRODUCTION TO SET THEORY 


Show that for any x, 

(a) P{X^-y^ >x) =P(X^ >x, X2 >x, . . >x), 

(b) P(X^^y <x) = P(X^ <x, X2<x, <x). 

In particular, if X^X 2 ,...,X„ form a sample of size n from a 
population with a cumulative distribution function F(x), show that 

(c) P(X(ij<x) = l-[l-F(x)r, 

(d) P(X^,y<x) = [F(x)]\ 

The statistics X^^^ and are called the first-order and nth-order 
statistics, respectively. 

1.26. Suppose that we hâve a sample of size n = 5 from a population with an 
exponential distribution whose density function is 

\ ü elsewhere . 

Find the value of P(2 < < 3). 



CHAPTER 2 


Basic Concepts in Linear Algebra 


In this chapter we présent some fundamental concepts concerning vector 
spaces and matrix algebra. The purpose of the chapter is to familiarize the 
reader with these concepts, since they are essential to the understanding of 
some of the remaining chapters. For this reason, most of the theorems in this 
chapter will be stated without proofs. There are several excellent books on 
linear algebra that can be used for a more detailed study of this subject (see 
the bibliography at the end of this chapter). 

In statistics, matrix algebra is used quite extensively, especially in linear 
models and multivariate analysis. The books by Basilevsky (1983), Graybill 
(1983), Magnus and Neudecker (1988), and Searle (1982) include many 
applications of matrices in these areas. 

In this chapter, as well as in the remainder of the book, éléments of the 
set of real numbers, R, are sometimes referred to as scalars. The Cartesian 
product xr. is denoted by which is also known as the n-dimensional 
Euclidean space. Unless otherwise stated, ail matrix éléments are considered 
to be real numbers. 


2.1. VECTOR SPACES AND SUBSPACES 

A vector space over R is a set V of éléments called vectors together with two 
operations, addition and scalar multiplication, that satisfy the following 
conditions: 

1. U + V is an element of V for ail u, v in K 

2. If U is a scalar and u e K, then u u e K 

3. U + V = V + U for ail U, V in K. 

4. U + (v + w) = (u + v) + w for ail u, v, w in K. 

5. There exists an element 0 e K such that 0 + u = u for ail u in K. This 
element is called the zéro vector. 


21 



22 


BASIC CONCEPTS IN LINEAR ALGEBRA 


6. For each u^V there exists a v e K such that u + v = 0. 

7. a(u + v) = au + tïv for any scalar a and any u and v in V. 

8. (a + /3)u = au + )Su for any scalars a and p and any u in V. 

9. a( /3u) = (a^)u for any scalars a and /3 and any u in V. 

10. lu = u for any u e K 

Example 2.1.1. A familiar example of a vector space is the n-dimen- 
sional Euclidean space R^. Here, addition and multiplication are defined as 
follows: If and are two éléments in then 

their sum is defined as (u^ + 1 ;^, ^2 + ^ 2 ? • • • ? If is a scalar, then 

a(u^, ^2? • • • î = a^2, . . . , aw„). 

Example 2.1.2. Let V be the set of ail polynomials in x of degree less 
than or equal to k. Then K is a vector space. Any element in V can be 
expressed as where the a/s are scalars. 

Example 2.1.3. Let V be the set of ail functions defined on the closed 
interval [— 1, 1]. Then K is a vector space. It can be seen that /(x) +g(x) and 
af(x) belong to V, where f(x) and g(x) are éléments in V and a is any 
scalar. 

Example 2.1.4. The set V of ail nonnegative functions defined on [ — 1, 1] 
is not a vector space, since if /(x) e V and a is a négative scalar, then 
a/(x) ^ K. 

Example 2.1.5. Let V be the set of ail points (x, y) on a straight line 
given by the équation 2x — y + 1 = 0. Then V is not a vector space. This is 
because if (x^, y^) and (x 2 , y 2 ^ belong to L, then (x^ +X 2 , yi +y 2 ) ^ since 
2(x^ +^ 2 ) “ ( 3^1 + 3 ^ 2 ) + 1 = — 1 0. Alternatively, we can State that V is not 

a vector space because the zéro element (0, 0) does not belong to L. This 
violâtes condition 5 for a vector space. 

A subset IL of a vector space V is said to form a vector subspace if W 
itself is a vector space. Equivalently, IL is a subspace if whenever u, ve IL 
and a is a scalar, then u + v e IL and au e IL. For example, the set W of ail 
continuons functions defined on [ — 1, 1] is a vector subspace of V in Example 
2.1.3. Also, the set of ail points on the straight line y — 2x = 0 is a vector 
subspace of However, the points on any straight line in not going 
through the origin (0,0) do not form a vector subspace, as was seen in 
Example 2.1.5. 

Définition 2.1.1. Let L be a vector space, and Ui,U 2 ,...,u„ be a collec- 
tion of n éléments in V. These éléments are said to be linearly dépendent if 
there exist n scalars a^, a 2 , . . . , a„, not ail equal to zéro, such that a-u^ 
= 0. If, however, a.u^ = 0 is true only when ail the a/s are zéro, then 



VECTOR SPACES AND SUBSPACES 


23 


Ui, U 2 , . . . , u„ are linearly independent. It should be noted that if u^, U 2 , . . . , u„ 
are linearly independent, then none of them can be zéro. If, for example, 
Ui = 0 , then + OU 2 + *** +0u„ = 0 for any a A 0, which implies that the 
u/s are linearly dépendent, a contradiction. □ 

From the preceding définition we can say that a collection of n éléments 
in a vector space are linearly dépendent if at least one element in this 
collection can be expressed as a linear combination of the remaining n — 1 
éléments. If no element, however, can be expressed in this fashion, then the 
n éléments are linearly independent. For example, in (1, 2, — 2), ( — 1, 0, 3), 
and (I, 4, — I) are linearly dépendent, since 2(1, 2, — 2) + ( — 1, 0, 3) — 
(1, 4, — I) = 0. On the other hand, it can be verified that (1, 1, 0), (1, 0, 2), and 
(0, 1,3) are linearly independent. 

Définition 2.1.2. Let Ui,U 2 ,...,u„ be n éléments in a vector space V. 
The collection of ail linear combinations of the form where the a/s 

are scalars, is called a linear span of u^U 2 ,...,u„ and is denoted by 
L(Ui,U2,...,U^). □ 

It is easy to see from the preceding définition that LCu^, U 2 , . . . ,u„) is a 
vector subspace of V. This vector subspace is said to be spanned by 

^ 1 ’ ^ 2 ’ • • • 5 

Définition 2.1.3. Let K be a vector space. If there exist linearly indepen- 
dent éléments Ui,U 2 ,...,u„ in V such that K= L(ui,U 2 , . . . , u„), then 
UpU 2 ,...,u„ are said to form a basis for V. The number n of éléments in 
this basis is called the dimension of the vector space and is denoted by dim V. 

□ 

Note that a basis for a vector space is not unique. However, its dimension 
is unique. For example, the three vectors (1,0,0), (0, 1,0), and (0,0, 1) form a 
basis for R^. Another basis for R^ consists of (1, 1,0), (1,0, 1), and (0, 1, 1). 

If Ui, U 2 , • • • , u„ form a basis for V and if u is a given element in K, then 
there exists a unique set of scalars, « 2 , . . . , such that u = To 

show this, suppose that there exists another set of scalars, /32, . . . , 
such that u = ftu. Then E"=i(o:/ - ft)u^ = 0, which implies that a, = 
for ail /, since the u/s are linearly independent. 

Let us now check the dimensions of the vector spaces for some of the 
examples described earlier. For Example 2.1.1, dimL=/r. In Example 2.1.2, 
{1, X, . . . , x^} is a basis for V\ hence dim L= Æ + 1. As for Example 2.1.3, 
dim V is infinité, since there is no finite set of functions that can span V. 

Définition 2.1.4. Let u and v be two vectors in i^”. The dot product (also 
called scalar product or inner product) of u and v is a scalar denoted by u*v 
and is given by 


n 

u-v= 

i = l 



24 


BASIC CONCEPTS IN LINEAR ALGEBRA 


where Ui and are the ith components of u and v, respectively (i = 
1,2, In particular, if u = v, then (u*u)^^^ = is called the 

Euclidean norm (or length) of u and is denoted by ||u|| 2 . The dot product of 
u and V is also equal to ||u|| 2 llv ||2 cos 0 , where 6 is the angle between u and v. 

□ 

Définition 2.1.5. Two vectors u and v in are said to be orthogonal if 
their dot product is zéro. □ 

Définition 2.1.6. Let be a vector subspace of R^. The vectors 
epe2,...,e^ form an orthonormal basis for U if they satisfy the following 
properties: 

1 . ei,C2, • • • form a basis for U. 

2. * Cy = 0 for ail i ^ j (i, 7 = 1,2,..., m). 

3. ||e ,-||2 = 1 for / = 1, 2, . . . , m. 

Any collection of vectors satisfying just properties 2 and 3 are said to be 
orthonormal. □ 


Theorem 2.1.1. Let u^U 2 ,...,u^ be a basis for a vector subspace U of 
R^. Then there exists an orthonormal basis, € 2 , . . . , e^, for U, given by 



where = u^. 



where V 2 = U 2 


Vi U2 



Vi I2 


’w - V— ^ '*m 

n T ’ ''m = - L n 

I Ym I 2 i=i \\i 2 

Proof. See Graybill (1983, Theorem 2.6.5). □ 

The procedure of constructing an orthonormal basis from any given basis 
as described in Theorem 2.1.1 is known as the Gram-Schmidt orthonormal- 
ization procedure. 

Theorem 2.1.2. Let u and v be two vectors in R^. Then: 

1 . |u*v| < ||u|| 2 l|v|| 2 . 

2 . ||u + VII 2 < ||u ||2 + l|v|| 2 . 

Proof. See Marcus and Mine (1988, Theorem 3.4). □ 



LINEAR TRANSFORMATIONS 


25 


The inequality in part 1 of Theorem 2.1.2 is known as the Cauchy-Schwarz 
inequality. The one in part 2 is called the triangle inequality. 

Définition 2.1.7. Let C/ be a vector subspace of R^. The orthogonal 
complément of U, denoted by U~^, is the vector subspace of which 
consists of ail vectors v such that u • v = 0 for ail u in Z7. □ 

Définition 2.1.8. Let f/ 2 ? • • • ? t)e vector subspaces of the vector 
space U. The direct sum of these vector subspaces, denoted by 
consists of ail vectors u that can be uniquely expressed as u = where 

Uj e / = 1,2, . . . , 7î. □ 

Theorem 2.1.3. Let U 2 , • • • , f/„ be vector subspaces of the vector space 
U. Then: 

1. is a vector subspace of U. 

2. If U = then H consists of just the zéro element 0 of U. 

3. dim dim U^. 

Proof The proof is left as an exercise. □ 

Theorem 2.1.4. Let f/ be a vector subspace of R^. Then R*^ = U ^ . 

Proof. See Marcus and Mine (1988, Theorem 3.3). □ 

From Theorem 2.1.4 we conclude that any v can be uniquely written 
as V = + V 2 , where and V 2 ^ f/ ^ . In this case, and V 2 are called 

the projections of v on f/ and U ^ , respectively. 


2.2. LINEAR TRANSFORMATIONS 


Let U and V be two vector spaces. A function T\ U is called a linear 
transformation if r(a^Ui + a2U2) = <^iT(ui) + a2T(u2) for ail 0^,02 in U 
and any scalars and «2. For example, let T\ R^ ^R^ be defined as 


T{x 


1 ? -^2 î -^3 


,) =(Xi-X 


2 ? 


X-, -\-x 


3 ? 



Then T is a linear transformation, since 
T[a{x^,X2,x^) + I3{y^,y2,yy 

= T{ax^ + Py^, aX2 + «^3 + 

= ( ax^ + /3y^ — o;X2 — /3y2? + Pyi + ^^3 + etx^ + Py^) 

= a(^i +^3>^3) + P{yi +y3>ys) 

= ar(xi, X2,X3) + PT{y^,y2,y2)- 



26 


BASIC CONCEPTS IN LINEAR ALGEBRA 


We note that the image of U under T, or the range of T, namely T{U\ is 
a vector subspace of K. This is true because if Vi,V2 are in T{U), then there 
exist Ui and U2 in U such that = rCu^) and V2 = r(u2). Hence, + V2 = 
Tiui) + r(u2) = + U2X which belongs to T{U). Also, if a is a scalar, 

then aT(u) = T(au)^ T(U) for any u e f/. 

Définition 2.2.1. Let T: ^ K be a linear transformation. The kernel of 

r, denoted by ker T, is the collection of ail vectors u in such that r(u) = 0, 
where 0 is the zéro vector in V. The kernel of T is also called the null space 
of T. 

As an example of a kernel, let T\ be defined as T{xi,X2,x^) = 

— X2, — X3). Then 

ker r = ((xi, %2, X3) \xi =^2, =X3} 

In this case, ker T consists of ail points (x^, X2, X3) in R^ that lie on a straight 
line through the origin given by the équations x^ =X2 =^3- □ 

Theorem 2 . 2 . 1 . Let T: ^ K be a linear transformation. Then we hâve 

the following: 

1 . ker r is a vector subspace of U. 

2 . dim U = dim(ker T) + dim[r(Z 7 )]. 


Proof Part 1 is left as an exercise. To prove part 2 we consider the 
following. Let dim U = n, dim(ker T) = p, and dim[r(f/)] = q. Let 
Up U2, . . . ,u^ be a basis for ker T, and v^, V2, . . . , v be a basis for T(U). Then, 
there exist vectors w^, W2, . . . , in 1 / such that r(w^) = (i = 1 , 2 , . . . , q). We 
need to show that Ui,U2, . . . , u^; Wi,W2,...,w^ form a basis for U, that is, 
they are linearly independent and span U. 

Suppose that there exist scalars «2? • • • ? Pv Pq 

P q 

E + E Aw, = 0. (2.1) 

i=l i=l 

Then 

(P g ' 

0 = r E E Aw, , 

\ / = 1 i = l 

where 0 represents the zéro vector in V 

= E aX(u;) + E PiT{y/i) 

i=\ / = 1 

q 

= X) AT(w,), since u, e ker T, i = 1 , 2 , . . . , /? 

i = l 

= E - 

ï = 1 



MATRICES AND DETERMINANTS 


27 


Since the v/s are linearly independent, then = 0 for / = 1, 2, . . . , From 
(2.1) it follows that a^ = 0 for / = 1,2, . . . , since the u/s are also linearly 
independent. Thus the vectors Ui,U 2 , . . . , u^; Wi,W 2 ,...,w^ are linearly inde- 
pendent. 

Let us now suppose that u is any vector in U. To show that it belongs to 
L(upU 2 ,...,u^; Wp W 2 , . . . , w^). Let v = r(u). Then there exist scalars 
« 1 , « 2 ? • • • ? such that V = It follows that 

r(u) = 

i=l 


Thus, 




and U — must then belong to ker T. Hence, 

q P 

U - 

i=l i=l 



for some scalars, b^, ^ 2 ^ • • • ^ From (2.2) we then hâve 


U 


P ^ 

i=\ i=l 


which shows that u belongs to the linear span of Ui,U 2 , . . . , u^; w^, W 2 , . . . ,w^. 
We conclude that these vectors form a basis for U. Hence, n=p + q. 

□ 


Corollary 2.2.1. T: 1/ ^ K is a one-to-one linear transformation if and 
only if dim(ker T) = 0. 

Proof If r is a one-to-one linear transformation, then ker T consists of 
just one vector, namely, the zéro vector. Hence, dim(ker T) = 0. Vice versa, if 
dim(ker T) = 0, or equivalently, if ker T consists of just the zéro vector, then 
T must be a one-to-one transformation. This is true because if and U 2 are 
in U and such that T(ui) = T(u 2 ), then T(ui — U 2 ) = 0, which implies that 
Ui - U 2 ^ ker T and thus — U 2 = 0. □ 


2.3. MATRICES AND DETERMINANTS 

Matrix algebra was devised by the English mathematician Arthur Cayley 
(1821-1895). The use of matrices originated with Cayley in connection with 



28 


BASIC CONCEPTS IN LINEAR ALGEBRA 


linear transformations of the form 

ax^ + bx2 =yi, 

CXi + dX2 =3^2 ? 

where a, b, c, and d are scalars. This transformation is completely deter- 
mined by the square array 

a b 
c dy 

which is called a matrix of order 2 X 2. In general, let T: 1/ ^ K be a linear 
transformation, where U and V are vector spaces of dimensions m and n, 
respectively. Let u^, U 2 , . . . , be a basis for U and v^, V 2 , . . . , v„ be a basis for 
K. For i= 1, 2, . . . , m, consider r(u,), which can be uniquely represented as 

n 

^(u/)= / = 1,2, 

y=i 

where the a,y’s are scalars. These scalars completely détermine ail possible 
values of F: If U e f/, then u = EJliC-u, for some scalars ^ 2 , . . . , Then 
T(u) = = E^iC-(E”=i^ï,yVy). By définition, the rectangular array 

^12 *** ^i« 

^22 *** ^2n 

» » 

» » 

» » 

«m2 *** ^mn 

is called a matrix of order mXn, which indicates that A has m rows and n 
columns. The a,y’s are called the éléments of A. In some cases it is more 
convenient to represent A using the notation A = (a^y). In particular, if 
m=n, then A is called a square matrix. Furthermore, if the off-diagonal 
éléments of a square matrix A are zéro, then A is called a diagonal matrix and 
is written as A = Diag(a^^, ^ 22 , . . . , In this spécial case, if the diagonal 
éléments are equal to I, then A is called the identity matrix and is denoted by 
I„ to indicate that it is of order nXn. A matrix of order m X I is called a 
column vector. Likewise, a matrix of order I X n is called a row vector. 



2.3.1. Basic Operations on Matrices 


1. Equality of Matrices. Let A = (a^j) and B = (b^j) be two matrices of the 
same order. Then A = B if and only if a^j = b^ for ail / = 1, 2, . . . , m; 
j 1, 2, . . . , 7Î. 


» » » 



MATRICES AND DETERMINANTS 


29 


2. Addition of Matrices. Let A=(a^j) and B = be two matrices of 
order mXn. Then A + B is a matrix C = of order mXn such that 
Cij = a^j + b^j (/ = 1, 2 , . . . , m; 7 = 1, 2 , . . . , n\ 

3. Scalar Multiplication. Let a be a scalar, and A = (a, y) be a matrix of 
order mXn. Then o^A = {aa^). 

4. The Transpose of a Matrix. Let A = (a,y) be a matrix of order mXn. 
The transpose of A, denoted by A', is a matrix of order nXm whose 
rows are the columns of A. For example, 





'2 

-1 

if A = 

'231' 
-1 0 7. 

, then A' = 

3 

0 




1 

7_ 


A matrix A is symmetric if A = A\ It is skew-symmetric if A' = — A. 
A skew-symmetric matrix must necessarily hâve zéro éléments along its 
diagonal. 

5. Product of Matrices. Let A = {a^) and B = (^,y) be matrices of orders 
mXn and nXp, respectively. The product AB is a matrix C = (c^y) of 
order mXp such that c^j = TJl=i^ik^ki (i = f2,...,m; 7 = 1 , 2 , , p). 
It is to be noted that this product is defined only when the number of 
columns of A is equal to the number of rows of B. 

In particular, if a and b are column vectors of order nXl, then their 
dot product a*b can be expressed as a matrix product of the form a'b 
or b'a. 

6 . The Trace of a Matrix. Let A = (a^y) be a square matrix of order nXn. 
The trace of A, denoted by tr(A), is the sum of its diagonal éléments, 
that is, 

n 

tr(A) = a,;. 

i = l 

On the basis of this définition, it is easy to show that if A and B are 
matrices of order nXn, then the following hold: (i) tr(AB) = tr(BA); 
(ii) tr(A + B) = tr(A) + tr(B). 

Définition 2.3.1. Let A = ia^j) be an m X n matrix. A submatrix B of A is 
a matrix which can be obtained from A by deleting a certain number of rows 
and columns. 

In particular, if the ith row and 7 ‘th column of A that contain the element 
a^j are deleted, then the resulting matrix is denoted by M,y (i= 1 , 2 , ...,m; 
7 = 1,2,..., 7î). 

Let us now suppose that A is a square matrix of order nXn. If rows 

/' 2 Î “-Àp and columns •••Ap are deleted from A, where p <n, then 
the resulting submatrix is called a principal submatrix of A. In particular, if 
the deleted rows and columns are the last p rows and the last p columns, 
respectively, then such a submatrix is called a leading principal submatrrx. 



30 


BASIC CONCEPTS IN LINEAR ALGEBRA 


Définition 2.3.2. A partitioned matrix is a matrix that consists of several 
submatrices obtained by drawing horizontal and vertical lines that separate it 
into groups of rows and columns. 

For example, the matrix 



1 

: 0 

3 : 

4 

-5' 

A = 

6 

: 2 

10 : 

5 

0 


_3 

• 2 

• 

1 • 

» 

0 

2_ 


is partitioned into six submatrices by drawing one horizontal line and two 
vertical lines as shown above. 

Définition 2.3.3. Let A = (a-j) be an X matrix and B be an m 2 X n -2 
matrix. The direct (or Kronecker) product of A and B, denoted by A <8) B, is a 
matrix of order m^m 2 X niU 2 defined as a partitioned matrix of the form 


A0 B = 




• • • 

«i« B 

^2iB 

» 

^22 B 

» 

• • • 

» 

» 

» 

«milB 

» 

» 

• • • 

» 

» 

B 


This matrix can be simplified by writing A <8) B = [üijB]. □ 

Properties of the direct product can be found in several matrix algebra 
books and papers. See, for example, Graybill (1983, Section 8.8), Henderson 
and Searle (1981), Magnus and Neudecker (1988, Chapter 2), and Searle 
(1982, Section 10.7). Some of these properties are listed below: 

1 . (A<8)B)' =A' <8)B'. 

2 . A (8) (B (8) C) = (A (8) B) (8) C. 

3. (A <S> B)(C <8) D) = AC <8) BD, if AC and BD are defined. 

4 . tr(A <8) B) = tr(A)tr(B), if A and B are square matrices. 

The paper by Henderson, Pukelsheim, and Searle (1983) gives a detailed 
account of the history associated with direct products. 

Définition 2.3.4. Let A^,A 2 ,...,A^ be matrices of orders niiXni {i = 
1,2, ...,Æ). The direct sum of these matrices, denoted by is a 

partitioned matrix of order (Ef=i/?î/) X (Ef=i/î/) that has the block-diagonal 
form 


k 


© A,- = Diag(Ai,A2, 

/ = ! 



» » » 



MATRICES AND DETERMINANTS 


31 


The following properties can be easily shown on the basis of the preceding 
définition: 

1 . + B^X if A,- and B^' are of the same order 
for / = 1,2, . . . , Æ. 

2. [©f^^Aj[©f^^Bj = ©f^^A^B,-, ifA^B,- is defined for /=1,2, 

3. [©,tjAj'= ©tiA',. 

4. tr( ©,ti A;) = Ef=i tr(A;). □ 

Définition 2.3.5. Let A=(a-j) be a square matrix of order nXn. The 
déterminant of A, denoted by det(A), is a scalar quantity that can be 
computed iteratively as 


n 


det(A) = X: 


y=i 


(2.3) 


where is a submatrix of A obtained by deleting row 1 and column j 
(y = 1, 2, ... , n). For each j\ the déterminant of is obtained in terms of 
déterminants of matrices of order (n — 2)x(n — 2) using a formula similar to 
(2.3). This process is repeated several times until the matrices on the 
right-hand side of (2.3) become of order 2x2. The déterminant of a 2 X 2 
matrix such as h = (b-j) is given by det(B) = ^^^^22 “ ^ 12 ^ 21 * Thus by an 
itérative application of formula (2.3), the value of det(A) can be fully 
determined. For example, let A be the matrix 



2 -1 
0 3 - 

2 1 


Then det(A) = det(A^) — 2det(A2) — det(A 3 ), where A^A 2 ,A 3 are 2x2 sub- 
matrices, namely 

5 0' 

1 2 / 

It follows that det(A) = — 6 — 2(2) — 10 = — 20. □ 


Ai = 


0 

3 

. A-i = 

5 

3 

, A^ = 

2 

1 

, r^2 

1 

1 



Définition 2.3.6. Let A = (a^y) be a square matrix order of nXn. The 
déterminant of M^j, the submatrix obtained by deleting row i and column j, 
is called a minor of A of order n — 1. The quantity ( — l)'"^^ det(M-y) is called 
a cofactor of the corresponding (i,j)th element of A. More generally, if A is 
an m X matrix and if we strike out ail but p rows and the same number of 
columns from A, where p < min(m, n), then the déterminant of the resulting 
submatrix is called a minor of A of order p. 



32 


BASIC CONCEPTS IN LINEAR ALGEBRA 


The déterminant of a principal submatrix of a square matrix A is called a 
principal minor. If, however, we hâve a leading principal submatrix, then its 
déterminant is called a leading principal minor. □ 

Note 2.3.1. The déterminant of a matrix A is defined only when A is a 
square matrix. 

Note 2.3.2. The expansion of det(A) in (2.3) was carried out by multiply- 
ing the éléments of the first row of A by their corresponding cofactors and 
then summing over j ( = 1,2, ... ,n). The same value of det(A) could hâve also 
been obtained by similar expansions according to the éléments of any row of 
A (instead of the first row), or any column of A. Thus if M is a submatrix of 
A obtained by deleting row i and column j, then det(A) can be obtained 
by using any of the following expansions: 


n 

Byrowi: det(A) = ^ ( — det(M-y), i=l,2,...,n. 

7 = 1 
n 

Bycolumn;: det(A) = X! ( — det(M,y), j =l,2,...,n. 

i = l 


Note 2.3.3. Some of the properties of déterminants are the following: 

i. det(AB) = det(A)det(B), if A and B are n Xn matrices. 

ii. If A' is the transpose of A, then det(A0 = det(A). 

iii. If A is an n Xn matrix and a is a scalar, then det(aA) = a” det(A). 

iv. If any two rows (or columns) of A are identical, then det(A) = 0. 

V. If any two rows (or columns) of A are interchanged, then det(A) is 
multiplied by — 1. 

vi. If det(A) = 0, then A is called a singular matrix. Otherwise, A is a 
nonsingular matrix. 

vii. If A and B are matrices of orders mXm and nXn, respectively, then 
the following hold: (a) det(A <S>B) = [det(A)]”[det(B)]^; (b) det(A ® B) 
= [det(A)][det(B)]. 

Note 2.3.4. The history of déterminants dates back to the fourteenth 
century. According to Smith (1958, page 273), the Chinese had some knowl- 
edge of déterminants as early as about 1300 A.D. Smith (1958, page 440) also 
reported that the Japanese mathematician Seki Kôwa (1642-1708) had 
discovered the expansion of a déterminant in solving simultaneous équations. 
In the West, the theory of déterminants is believed to hâve originated with 
the German mathematician Gottfried Leibniz (1646-1716) in 1693, ten years 



MATRICES AND DETERMINANTS 


33 


after the work of Seki Kôwa. However, the actual development of the theory 
of déterminants did not begin until the publication of a book by Gabriel 
Cramer (1704-1752) (see Price, 1947, page 85) in 1750. Other mathemati- 
cians who contributed to this theory include Alexandre Vandermonde 
(1735-1796), Pierre-Simon Laplace (1749-1827), Cari Gauss (1777-1855), 
and Augustin-Louis Cauchy (1789-1857). Arthur Cayley (1821-1895) is cred- 
ited with having been the first to introduce the common present-day notation 
of vertical bars enclosing a square matrix. For more interesting facts about 
the history of déterminants, the reader is advised to read the article by Price 
(1947). 


2.3.2. The Rank of a Matrix 

Let A = (a-j) be a matrix of order m Xn. Let u\, u' 2 , • • • , dénoté the row 
vectors of A, and let Vi,V 2 ,...,v„ dénoté its column vectors. Consider the 
linear spans of the row and column vectors, namely, = L(u\,U 2 , . . . , u'^), V 2 
= L(vi, V 2 , . . . , v„), respectively. 

Theorem 2.3.1. The vector spaces and V 2 hâve the same dimension. 

Proof See Lancaster (1969, Theorem 1.15.1), or Searle (1982, Section 6.6). 

□ 


Thus, for any matrix A, the number of linearly independent rows is the 
same as the number of linearly independent columns. 

Définition 2.3.7, The rank of a matrix A is the number of its linearly 
independent rows (or columns). The rank of A is denoted by r(A). □ 


Theorem 2.3.2. If a matrix A has a nonzero minor of order r, and if ail 
minors of order r + 1 and higher (if they exist) are zéro, then A has rank r. 


Proof. See Lancaster (1969, Lemma 1, Section 1.15). □ 

For example, if A is the matrix 



3 -1 

1 2 , 

4 1 


then r(A) = 2. This is because det(A) = 0 and at least one minor of order 2 is 
different from zéro. 



34 


BASIC CONCEPTS IN LINEAR ALGEBRA 


There are several properties associated with the rank of a matrix. Some of 
these properties are the following: 

1 . r(A)=r(A'). 

2. The rank of A is unchanged if A is multiplied by a nonsingular matrix. 
Thus if A is an m X 7î matrix and P is an n X n nonsingular matrix, then 
r(A) = r(AP). 

3. r(A) = r(AA') = r(A'A). 

4. If the matrix A is partitioned as A=[Ai:A 2 ], where A^ and A 2 are 
submatrices of the same order, then r(A^ + A 2 ) < r(A) < r(A^) + r(A 2 ). 
More generally, if the matrices A^A 2 ,...,A^ are of the same order 
and if A is partitioned as A = [A^ : A 2 : * * * : A^], then 


/ k 


Ea,. 


<r(A) < E^A,)- 


\i = l 


i= 1 


5. If the product AB is defined, then r(A) + r(B) — n < r(AB) < 
min{r(A), r(B)}, where n is the number of columns of A (or the number 
of rows of B). 

6. r(A <S>B) = r(A)r(B). 

7. r(A e B) = r(A) + r(B). 

Définition 2.3.8. Let A be a matrix of order mXn and rank r. Then we 
hâve the following: 

1. A is said to hâve a full row rank if r = m <n. 

2. A is said to hâve a full column rank if r = n<m. 

3. A is of full rank if r = m= n. In this case, det(A) ^ 0, that is, A is a 

nonsingular matrix. □ 


2.3.3. The Inverse of a Matrix 

Let A = (aij) be a nonsingular matrix of order nXn. The inverse of A, 
denoted by A“^, is an nXn matrix that satisfies the condition AA“^ = A“^A 

= I„* 

The inverse of A can be computed as follows: Let c-- be the cofactor of a-j 
(see Définition 2 . 3 . 6 ). Define the matrix C as C = The transpose of C is 
called the adjugate or adjoint of A and is denoted by adj A. The inverse of A 
is then given by 


A 


-1 


adj A 
det(A) ‘ 



MATRICES AND DETERMINANTS 


35 


It can be verified that 


A 


adjA 


adjA 

det(A) 


det(A) 




For example, if A is the matrix 



0 1 

2 0 , 

1 1 


then det(A) = — 3, and 


Hence, 



-2 

-3 

4 




2 

3 


0 

2 

3 


1 



Some properties of the inverse operation are given below: 

1. (AB)“^ =B“^A 

2. (A0-'=(A-O^ 

3. det(A” 0=1/ det(A). 

4. (A-0“^ =A. 

5. (A(8)B)-i =A“i 

6. (AeB)-i =A“i eB“/ 

7. If A is partitioned as 




? 


where A^j is of order rii X rij (/ j = 1, 2), then 


det(A) = 


det(A^0 ' '^21'^n '^ 12 ) 
det(A22) * det(A;^^ — A;^2'^22^^2i) 


if A^^ is nonsingular, 
if A 22 is nonsingular. 



36 


BASIC CONCEPTS IN LINEAR ALGEBRA 


The inverse of A is partitioned as 


where 



B 

B 


11 


21 


^12 

^22 


? 


^11 ^ (^11 “^ 12 ^ 22 ^^ 2l ) ? 

Bi2 = — B^^ A^2-^22^ ’ 

^21 ^ “■^ 22 ^'^ 21 ^ 11 ’ 


R = A“^ 

"22 ^22 


'^ 22 '^ 21 ^ 11 '^ 12 '^ 


-1 
22 • 


2.3.4. Generalized Inverse of a Matrix 

This inverse represents a more general concept than the one discussed in the 
previous section. Let A be a matrix of order mXn. Then, a generalized 
inverse of A, denoted by A“, is a matrix of order nXm that satisfies the 
condition 


AA“A = A. (2.4) 

Note that A“ is defined even if A is not a square matrix. If A is a square 
matrix, it does not hâve to be nonsingular. Furthermore, condition (2.4) can 
be satisfied by infinitely many matrices (see, for example, Searle, 1982, 
Chapter 8). If A is nonsingular, then (2.4) is satisfied by only A“^. Thus A“^ 
is a spécial case of A“. 

Theorem 2.3.3. 

1. If A is a symmetric matrix, then A“ can be chosen to be symmetric. 

2. A(A'A)“A'A = A for any matrix A. 

3. A(A'A)“A' is invariant to the choice of a generalized inverse of A' A. 

Proof. See Searle (1982, pages 221-222). □ 


2.3.5. Eigenvalues and Eigenvectors of a Matrix 

Let A be a square matrix of order nXn.By définition, a scalar A is said to be 
an eigenvalue (or characteristic root) of A if A — AI„ is a singular matrix, that 
is. 


det(A— AI„) = 0. 


(2.5) 



MATRICES AND DETERMINANTS 


37 


Thus an eigenvalue of A satisfies a polynomial équation of degree n called 
the characteristic équation of A. If A is a multiple solution (or root) of 
équation (2.5), that is, (2.5) has several roots, say m, that are equal to A, then 
A is said to be an eigenvalue of multiplicity m. 

Since r(A— AI„) </r by the fact that A — AI„ is singular, the columns of 
A — AI„ must be linearly related. Hence, there exists a nonzero vector v such 
that 


(A-AI„)v = 0, (2.6) 

or equivalently, 

Av = Av. (2-7) 

A vector satisfying (2.7) is called an eigenvector (or a characteristic vector) 
corresponding to the eigenvalue A. From (2.7) we note that the linear 
transformation of v by the matrix A is a scalar multiple of v. 

The following theorems describe certain properties associated with eigen- 
values and eigenvectors. The proofs of these theorems can be found in 
standard matrix algebra books (see the annotated bibliography). 

Theorem 2.3.4. A square matrix A is singular if and only if at least one of 
its eigenvalues is equal to zéro. In particular, if A is symmetric, then its rank 
is equal to the number of its nonzero eigenvalues. 

Theorem 2.3.5. The eigenvalues of a symmetric matrix are real. 

Theorem 2.3.6. Let A be a square matrix, and let A^ A 2 , . . . , A^ dénoté its 
distinct eigenvalues. If v^,V 2 ,...,v^ are eigenvectors of A corresponding 
to A^, A 2 , . . . , respectively, then v^, V 2 , . . . , are linearly independent. In 
particular, if A is symmetric, then v^, V 2 , . . . , V;. are orthogonal to one another, 
that is, v/ Vy = 0 for i # j (i, 7 = 1,2,..., A). 

Theorem 2.3.7. Let A and B be two matrices of orders mXm and nXn, 
respectively. Let A^, A 2 , . . . , A^ be the eigenvalues of A, and be 

the eigenvalues of B. Then we hâve the following: 

1. The eigenvalues of A< 8 )B are of the form X^Vj (/ = 1, 2, . . . , m; 7 = 

1,2,..., /î). 

2. The eigenvalues of A ® B are A^ A 2 , . . . , A^; 1 ^ 2 , • • • , 

Theorem 2.3.8. Let A^, A 2 ,...,A„ be the eigenvalues of a matrix A of 
order nXn. Then the following hold: 


1. tr(A) = E”=iA^-. 

2. det(A) = n"=iA^. 



38 


BASIC CONCEPTS IN LINEAR ALGEBRA 


Theorem 2.3.9. Let A and B be two matrices of orders mXn and nXm 
(n > m), respectively. The nonzero eigenvalues of BA are the same as those 
of AB. 


2.3.6. Some Spécial Matrices 

1. The vector 1„ is a column vector of ones of order nXl. 

2. The matrix J„ is a matrix of ones of order n Xn. 

3. Idempotent Matrix. A square matrix A for which A^ = A is called an 
idempotent matrix. For example, the matrix A = I„ — (l/7r)J„ is idem- 
potent of order nXn. The eigenvalues of an idempotent matrix are 
equal to zéros and ones. It follows from Theorem 2.3.8 that the rank of 
an idempotent matrix, which is the same as the number of eigenvalues 
that are equal to 1, is also equal to its trace. Idempotent matrices are 
used in many applications in statistics (see Section 2.4). 

4. Orthogonal Matrix. A square matrix A is orthogonal if A'A = I. From 
this définition it follows that (i) A is orthogonal if and only if A' = A“^; 
(ii) |det(A)| = 1. A spécial orthogonal matrix is the Householder matrix, 
which is a symmetric matrix of the form 


H = I — 2uu'/u'u, 


where u is a nonzero vector. Orthogonal matrices occur in many 
applications of matrix algebra and play an important rôle in statistics, 
as will be seen in Section 2.4. 


2.3.7. The Diagonalization of a Matrix 

Theorem 2.3.10 (The Spectral Décomposition Theorem). Let A be a 
symmetric matrix of order nXn. There exists an orthogonal matrix P such 
that A = PA P', where A = DiagCA^, A 2 , . . . , A„) is a diagonal matrix whose 
diagonal éléments are the eigenvalues of A. The columns of P are the 
corresponding orthonormal eigenvectors of A. 

Proof. See Basilevsky (1983, Theorem 5.8, page 200). □ 

If P is partitioned as P = [pp P 2 : *** :p^], where p, is an eigenvector of A 
with eigenvalue A^ (i = 1, 2, ... , n), then A can be written as 


n 

A= E 


i = l 



MATRICES AND DETERMINANTS 
For example, if 




then A has two distinct eigenvalues, = 0 of multiplicity 2 and A 2 = 5. For 
A^ = 0 we hâve two orthonormal eigenvectors, = (2, 0, l)Vv^ and P 2 = 
(0,1,0)'. Note that p^ and P 2 span the kernel (null space) of the linear 
transformation represented by A. For A 2 = 5 we hâve the normal eigenvector 
P3 = (1,0,— 2)'/V^, which is orthogonal to both p^ and P2* Hence, P and A 
in Theorem 2.3.10 for the matrix A are 




? 


A = Diag(0,0,5). 


The next theorem gives a more general form of the spectral décomposition 
theorem. 


Theorem 2.3.11 (The Singular- Value Décomposition Theorem). Let A be 
a matrix of order mXn{m <n) and rank r. There exist orthogonal matrices 
P and Q such that A = P[D:0]Q', where D = Diag(A^ A 2 , . . . , A^) is a diago- 
nal matrix with nonnegative diagonal éléments called the singular values of 
A, and 0 is a zéro matrix of order mX{n — m). The diagonal éléments of D 
are the square roots of the eigenvalues of AA' . 

Proof. See, for example, Searle (1982, pages 316-317). □ 


2.3.8. Quadratic Forms 

Let A = (a^p be a symmetric matrix of order nXn, and let x = (x^, X 2 , . . . , x„)' 
be a column vector of order n Xl. The function 

^(x) = x'Ax 

n n 

= E E atjXiXj 
i = l j = l 


is called a quadratic form in x. 



40 


BASIC CONCEPTS IN LINEAR ALGEBRA 


A quadratic form x'Ax is said to be the following: 

1. Positive definite if x'Ax > 0 for ail x 0 and is zéro only if x = 0. 

2. Positive semidefinite if x'Ax > 0 for ail x and x'Ax = 0 for at least one 
nonzero value of x. 

3. Nonnegative definite if A is either positive definite or positive semi- 
definite. 

Theorem 2.3.12. Let A = (a-j) be a symmetric matrix of order nXn. 
Then A is positive definite if and only if either of the following two conditions 
is satisfied: 

1. The eigenvalues of A are ail positive. 

2. The leading principal minors of A are ail positive, that is, 


j > 0, . . . , det(A) > 0. 

Proof. The proof of part 1 follows directly from the spectral décomposi- 
tion theorem. For the proof of part 2, see Lancaster (1969, Theorem 2.14.4). 

□ 

Theorem 2.3.13. Let A = {a^) be a symmetric matrix of order nXn. 
Then A is positive semidefinite if and only if its eigenvalues are nonnegative 
with at least one of them equal to zéro. 

Proof. See Basilevsky (1983, Theorem 5.10, page 203). □ 


«Il > 0, 


det 


/ r 


U 


a 


a 


11 


21 


a 


a 


12 


22 


2.3.9. The Simultaneous Diagonalization of Matrices 

By simultaneous diagonalization we mean finding a matrix, say Q, that can 
reduce several square matrices to a diagonal form. In many situations there 
may be a need to diagonalize several matrices simultaneously. This occurs 
frequently in statistics, particularly in analysis of variance. 

The proofs of the following theorems can be found in Graybill (1983, 
Chapter 12). 

Theorem 2.3.14. Let A and B be symmetric matrices of order nXn. 

1. If A is positive definite, then there exists a nonsingular matrix Q such 
that Q'aq = i„ and Q'BQ = D, where D is a diagonal matrix whose 
diagonal éléments are the roots of the polynomial équation det(B — AA) 
= 0 . 



MATRICES AND DETERMINANTS 


41 


2. If A and B are positive semidefinite, then there exists a nonsingular 
matrix Q such that 


Q'aq = Di, 

Q'bq = D2, 

where and D 2 are diagonal matrices (for a detailed proof of this 
resuit, see Newcomb, 1960). 


Theorem 2.3.15. Let A^, A 2 , . . . , A^ be symmetric matrices of order n Xn. 
Then there exists an orthogonal matrix P such that 

A,- = PA^.P', / = 

where A^ is a diagonal matrix, if and only if A^Aj = AjA^ for ail 
(i,j = 1,2, 


2.3.10. Bounds on Eigenvalues 

Let A be a symmetric matrix of order n Xn. We dénoté the ith eigenvalue of 
A by c,(A), / = 1, 2, . . . , 7î. The smallest and largest eigenvalues of A are 
denoted by and e^^^(A), respectively. 

Theorem 2.3.16. < x'Ax/x'x < Cj^^(A). 

Proof. This follows directly from the spectral décomposition theorem. □ 

The ratio x'Ax/x'x is called Rayleigh’s quotient for A. The lower and 
upper bounds in Theorem 2.3.16 can be achieved by choosing x to be an 
eigenvector associated with and c^^^CA), respectively. Thus Theorem 

2.3.16 implies that 



x'Ax 

inf 


x^O 

XX 


x'Ax 

sup 

/ 


X X 


^min(A) î 


^max (A) • 


( 2 . 8 ) 

(2.9) 


Theorem 2.3.17. If A is a symmetric matrix and B is a positive definite 
matrix, both of order nXn, then 

x'Ax 


Proof The proof is left to the reader. 


□ 



42 


BASIC CONCEPTS IN LINEAR ALGEBRA 


Note that the above lower and upper bounds are equal to the infimum and 
supremum, respectively, of the ratio x'Ax/x'Bx for x ^ 0. 


Theorem 2.3.18. If A is a positive semidefinite matrix and B is a positive 
definite matrix, both of order nXn, then for any i (i = 1,2, . . . , n). 


<e,(AB) <e,(A)e_(B). 

Furthermore, if A is positive definite, then for any i (i= 1, 2, , n), 


( 2 . 10 ) 


ef(AB) 


emax(A)e„ax(B) 


<e,.(A)e,(B)< 


ef(AB) 


emin(A)e„,i„(B) 


Proof See Anderson and Gupta (1963, Corollary 2.2.1). □ 

A spécial case of the double inequality in (2.10) is 

emin(A)emm(B) <e,(AB) <e^^(A)e„^(B), 

for ail i {i= 1,2, 


Theorem 2.3.19. Let A and B be symmetric matrices of order nXn. 
Then, the following hold: 

1. c,(A) < c-(A + B), I = 1, 2, . . . , 7î, if B is nonnegative definite. 

2. c,(A) < e-(A + B), / = 1, 2, . . . , 7î, if B is positive definite. 

Proof. See Bellman (1970, Theorem 3, page 117). □ 


Theorem 2.3.20 (Schur’s Theorem). Let A = {a^j) be a symmetric matrix 
of order nXn, and let IIAH 2 dénoté its Euclidean norm, defined as 


Then 



' n n 

IIA||2 = 

E E4 


\ i = l j = l / 


1/2 


Zef(A)=\\A]\l. 

i = l 


Proof See Lancaster (1969, Theorem 7.3.1). 


□ 


Since IIAH 2 < n max- j\a-j\ , then from Theorem 2.3.20 we conclude that 


emax(A)l <nmax|a,.^.|. 



APPLICATIONS OF MATRICES IN STATISTICS 


43 


Theorem 2.3.21. Let A be a symmetric matrix of order nXn, and let m 
and ^ be defined as 


Then 



5 = 



— m 


1/2 


<e^^{A) <m- 

(n-1) 

^ 1 /2 

"^ + 7 77172 ^^max(A) +^(n- 1) , 

(n-1) 

emax(A) -e„in(A) 

Proof See Wolkowicz and Styan (1980, Theorems 2.1 and 2.5). 


□ 


2.4. APPLICATIONS OF MATRICES IN STATISTICS 

The use of matrix algebra is quite prévalent in statistics. In fact, in the areas 
of experimental design, linear models, and multivariate analysis, matrix 
algebra is considered the most frequently used branch of mathematics. 
Applications of matrices in these areas are well documented in several books, 
for example, Basilevsky (1983), Graybill (1983), Magnus and Neudecker 
(1988), and Searle (1982). We shall therefore not attempt to duplicate the 
material given in these books. 

Let us consider the following applications: 


2.4.1. The Analysis of the Balanced Mixed Model 

In analysis of variance, a linear model associated with a given experimental 
situation is said to be balanced if the numbers of observations in the 
subclasses of the data are the same. For example, the two-way crossed-classi- 
fication model with interaction, 

yijk = /^+ + / 3 y + ( ^P)ij + ^ijk^ ( 2 - 11 ) 

/=1,2, ...,a; 7 = 1,2, Æ = 1,2, . . . , n, is balanced, since there are n 

observations for each combination of i and j. Here, and represent the 
main effects of the factors under considération, (a/3)jy dénotés the interac- 
tion effect, and is a random error term. Model (2.11) can be written in 
vector form as 


y = HqTo + HiTi + H2T2 + H3T3 + H4T4, 


( 2 . 12 ) 



44 


BASIC CONCEPTS IN LINEAR ALGEBRA 


where y is the vector of observations, Tq = /x, = (0;^, «2, . . . , T2 = 

(/3i, /32, . . . , /3fc)', T3 = [(a/3)ii, (a/3)i2, • • • , (a/3)«J', and = 
(eju, €112, The matrices H, (i = 0, 1,2, 3, 4) can be expressed as 

direct products of the form 

Ho=l,®lfc®l„, 

Hi = I,®l,® 1 „, 

H 2 =l,®Ifc®l„, 

H 3 = I,®I,® 1 „, 

H 4 = I,®I,®I„. 

In general, any balanced linear model can be written in vector form as 

V 

y=EH,T„ (2.13) 

/ = 0 

where (/ = 0, 1 , . . . , is a direct product of identity matrices and vectors 
of ones (see Khuri, 1982). If (0<v—ï) are fixed unknown 

parameter vectors (fixed effects), and 'r^+2? • • • ? ^re random vectors 

(random effects), then model (2.11) is called a balanced mixed model. 
Furthermore, if we assume that the random effects are independent and hâve 
the normal distributions MO, where is the number of columns of 

H^, 1= 0-\- then, because model (2.11) is balanced, its statisti- 

cal analysis becomes very simple. Here, the o-^^’s are called the modePs 
variance components. A balanced mixed model can be written as 

y = Xg + Zh (2.14) 

where Xg = fixed portion of the model, and Zh = 

is its random portion. The variance -covariance matrix of y is given by 

2= E A, (7,2, 

i=e+i 

where = (/ = 0 + 1, ^ + 2, . . . , r»). Note that A^A^=A^A^ for ail 

I ^p. Hence, the matrices A/ can be diagonalized simultaneously (see Theo- 
rem 2.3.15). 

If y'Ay is a quadratic form in y, then y'Ay is distributed as a noncentral 
chi-squared variate if only if AS is idempotent of rank m, where 

7] is the noncentrality parameter and is given by 17 = g'X'AXg (see Searle, 
1971, Section 2.5). 

The total sum of squares, y'y, can be uniquely partitioned as 

V 

y'y= Ey'P;y> 

/ = 0 



APPLICATIONS OF MATRICES IN STATISTICS 


45 


where the P/’s are idempotent matrices such that P/P^ = 0 for ail li=s (see 
Khuri, 1982). The quadratic form y'P^y (/ = 0, 1, . . . , r») is positive semidefi- 
nite and represents the sum of squares for the /th effect in model (2.13). 

Theorem 2.4.1. Consider the balanced mixed model (2.14), where the 
random effects are assumed to be independently and normally distributed 
with zéro means and variance -covariance matrices {I = 0 1, 0 -\- 

2, . . . , v). Then we hâve the following: 

1. y'Pgy, y'P^y, . . . , y'P^y are statistically independent. 

2. y'P^y/ô; is distributed as a noncentral chi-squared variate with degrees 

of freedom equal to the rank of P; and noncentrality parameter given 
by 17 ^ = g'X'P^Xg/ô; for /=0, 1,...,0, where ô/ is a particular linear 
combination of the variance components 0 -^+ 2 ? • However, 

for / = ^ + 1, ^ + 2, . . . , that is, for the random effects, y'P/y/ô/ is 
distributed as a central chi-squared variate with m/ degrees of freedom, 
where = r(P,). 

Proof See Theorem 4.1 in Khuri (1982). □ 

Theorem 2.4.1 provides the basis for a complété analysis of any balanced 
mixed model, as it can be used to obtain exact tests for testing the signifi- 
cance of the fixed effects and the variance components. 

A linear function a 'g, of g in model (2.14), is estimable if there exists a 
linear function, c'y, of the observations such that £'(c'y) = a'g. In Searle 
(1971, Section 5.4) it is shown that a'g is estimable if and only if a' belongs to 
the linear span of the rows of X. In Khuri (1984) we hâve the following 
theorem: 

Theorem 2.4.2. Consider the balanced mixed model in (2.14). Then we 
hâve the following: 

1. r(P^X)=r(P;), / = 0,1,...,^. 

2. r(X) = EtoKP,X). 

3. PoXg, P^Xg, . . . , P^Xg are linearly independent and span the space of ail 
estimable linear functions of g. 

Theorem 2.4.2 is useful in identifying a basis of estimable linear functions 
of the fixed effects in model (2.14). 


2.4.2. The Singular-Value Décomposition 

The singular-value décomposition of a matrix is far more useful, both in 
statistics and in matrix algebra, then is commonly realized. For example, it 



46 


BASIC CONCEPTS IN LINEAR ALGEBRA 


plays a significant rôle in régression analysis. Let us consider the linear 
model 


y = Xp + e, (2.15) 

where y is a vector of n observations, X is an n Xp (n >/?) matrix consisting 
of known constants, P is an unknown parameter vector, and e is a random 
error vector. Using Theorem 2.3.11, the matrix X' can be expressed as 

X'=P[D:0]Q\ (2.16) 

where P and Q are orthogonal matrices of orders pXp and nXn, respec- 
tively, and D is a diagonal matrix of order p Xp consisting of nonnegative 
diagonal éléments. These are the singular values of X (or of XO and are the 
positive square roots of the eigenvalues of X'X. From (2.16) we get 


X = Q 


D 

0 ' 



(2.17) 


If the columns of X are linearly related, then they are said to be 
multicollinear. In this case, X has rank r ( </?), and the columns of X belong 
to a vector subspace of dimension r. At least one of the eigenvalues of X'X, 
and hence at least one of the singular values of X, will be equal to zéro. In 
practice, such exact multicollinearities rarely occur in statistical applications. 
Rather, the columns of X may be “nearly” linearly related. In this case, the 
rank of X is p, but some of the singular values of X will be “near zéro.” We 
shall use the term multicollinearity in a broader sense to describe the latter 
situation. It is also common to use the term “ill conditioning” to refer to the 
same situation. 

The presence of multicollinearities in X can hâve adverse effects on the 
least-squares estimate, p, of p in (2.15). This can be easily seen from the fact 
that p = (X'X)“^X> and Var(p) = (X'X)“ where is the error vari- 

A 

ance. Large variances associated with the éléments of p can therefore be 

A 

expected when the columns of X are multicollinear. This causes p to become 
an unreliable estimate of p. For a detailed study of multicollinearity and its 
effects, see Belsley, Kuh, and Welsch (1980, Chapter 3), Montgomery and 
Peck (1982, Chapter 8), and Myers (1990, Chapter 3). 

The singular-value décomposition of X can provide useful information for 
detecting multicollinearity, as we shall now see. Let us suppose that the 
columns of X are multicollinear. Because of this, some of the singular values 
of X, say P 2 (<p) of them, will be “near zéro.” Let us partition D in (2.17) as 



APPLICATIONS OF MATRICES IN STATISTICS 


47 


where and D 2 are of orders PiXpi and P 2 XP 2 (Pi^ P respec- 

tively. The diagonal éléments of Z >2 consist of those singular values of X 
labeled as “near zero.”Let us now write (2.17) as 


XP 




(2.18) 


Let us next partition P and Q as P = [P^ : P 2 ], Q = [Qp Q 2 ], where P^ and 
P 2 hâve Pi and p 2 columns, respectively, and and Q 2 hâve pi and n — pi 
columns, respectively. From (2.18) we conclude that 


XPi = QiDi, (2.19) 

XP2==0, (2.20) 

where = represents approximate equality. The matrix XP 2 is “near zéro” 
because of the smallness of the diagonal éléments of D 2 . 

We note from (2.20) that each column of P 2 provides a “near”-linear 
relationship among the columns of X. If (2.20) were an exact equality, then 
the columns of P 2 would provide an orthonormal basis for the null space 
of X. 

We hâve mentioned that the presence of multicollinearity is indicated by 
the “smallness” of the singular values of X. The problem now is to détermine 
what “small” is. For this purpose it is common in statistics to use the 
condition number of X, denoted by /c(X). By définition 

^max 

? 

min 



where and are, respectively, the largest and smallest singular 

values of X. Since the singular values of X are the positive square roots of the 
eigenvalues of X'X, then k(X) can also be written as 



gmax(X'X) 
e„,in(X'X) • 


If k(X) is less than 10, then there is no serions problem with multi- 
collinearity. Values of /c(X) between 10 and 30 indicate moderate to strong 
multicollinearity, and if /c > 30, severe multicollinearity is implied. 

More detailed discussions concerning the use of the singular-value décom- 
position in régression can be found in Mandel (1982). See also Lowerre 
(1982). Good (1969) described several applications of this décomposition in 
statistics and in matrix algebra. 



48 


BASIC CONCEPTS IN LINEAR ALGEBRA 


2.4.3. Extrema of Quadratic Forms 

In many statistical problems there is a need to find the extremum (maximum 
or minimum) of a quadratic form or a ratio of quadratic forms. Let us, for 
example, consider the following problem: 

Let XpX 2 ,...,X„ be a collection of random vectors, ail having the same 
number of éléments. Suppose that these vectors are independently and 
identically distributed (i.i.d.) as A^(|x, 2), where both |x and 2 are unknown. 
Consider testing the hypothesis |x = [Xq versus its alternative |ix # |Xq, 
where [Xg is some hypothesized value of jx. We need to develop a test 
statistic for testing //q. 

The multivariate hypothesis Hq is true if and only if the univariate 
hypothèses 

are true for ail A. ^ 0. A test statistic for testing is the following: 

’ A'sx 




where X = E”=iX^//r and S is the sample variance -covariance matrix, which 
is an unbiased estimator of 2, and is given by 

S = ^ E(X,-X)(X,-X)'. 

«-C=i 

Large values of indicate falsehood of Since is rejected if 

and only if Hq{\) is rejected for at least one X, then the condition to reject 
Hq at the oî“level is sup^^ ^ (X)J >C„, where is the upper 100 a; % point 
of the distribution of sup;^ ^ Q[t^(X)]. But 


Now, 


sup X)] = sup 


n|X'(X-,jio)|2 

\'S\ 


= nsup 


= ne 


max 


X'(X->^o)(X->.o)'X 

X'SX 

S-i(X-^o)(X-^o)'], 


by Theorem 2.3.17. 


e 


max 



f^o)(X- |Xo)'] =e_[(X- f.o)'S-i(X- ^o)], 

= (X-,ao)'S-i(X-j.o). 

by Theorem 2.3.9. 



APPLICATIONS OF MATRICES IN STATISTICS 


49 


Hence, 

sup [t\k)] =n(x- M,o)'S“i(X- fjio) 

is the test statistic for the multivariate hypothesis Hq. This is called Hotelling’s 
r^-statistic. Its critical values are obtained in terms of the critical values of 
the F-distribution (see, for example, Morrison, 1967, Chapter 4). 

Another example of using the extremum of a ratio of quadratic forms is in 
the détermination of the canonical corrélation coefficient between two ran- 
dom vectors (see Exercise 2.26). The article by Bush and Olkin (1959) lists 
several similar statistical applications. 

2.4.4. The Parameterization of Orthogonal Matrices 

Orthogonal matrices are used frequently in statistics, especially in linear 
models and multivariate analysis (see, for example, Graybill, 1961, Chapter 
11; James, 1954). 

The éléments of an n X n orthogonal matrix Q are subject to n(n + l)/2 
constraints because Q'Q = L These éléments can therefore be represented 
by — n(n + l)/2 = n(n — l)/2 independent parameters. The need for such 
a représentation arises in several situations. For example, in the design of 
experiments, there may be a need to search for an orthogonal matrix that 
satisfies a certain optimality criterion. Using the independent parameters of 
an orthogonal matrix can facilitate this search. Khuri and Myers (1981) 
followed this approach in their construction of a response surface design that 
is robust to nonnormality of the error distribution associated with the 
response function. Another example is the génération of random orthogonal 
matrices for carrying out simulation experiments. This was used by Heiberger, 
Velleman, and Ypelaar (1983) to construct test data with spécial properties 
for multivariate linear models. Anderson, Olkin, and Underhill (1987) pro- 
posed a procedure to generate random orthogonal matrices. 

Methods to parameterize an orthogonal matrix were reviewed in Khuri 
and Good (1989). One such method is to use the relationship between an 
orthogonal matrix and a skew-symmetric matrix. If Q is an orthogonal matrix 
with déterminant equal to one, then it can be written in the form 

where T is a skew-symmetric matrix (see, for example, Gantmacher, 1959). 
The éléments of T above its main diagonal can be used to parameterize Q. 
This exponential mapping is defined by the infinité sériés 

= I + T + h h ••• . 

2! 3! 



50 


BASIC CONCEPTS IN LINEAR ALGEBRA 


The exponential parameterization was used in a theorem concerning the 
asymptotic joint density function of the eigenvalues of the sample 
variance -covariance matrix (Muirhead, 1982, page 394). 

Another parameterization of Q is given by 

Q = (I-U)(I + U)“\ 

where U is a skew-symmetric matrix. This relationship is valid provided that 
Q does not hâve the eigenvalue —1. Otherwise, Q can be written as 

Q = L(I-U)(I + U)"\ 

where L is a diagonal matrix in which each element on the diagonal is either 
1 or —1. Arthur Cayley (1821-1895) is credited with having introduced the 
relationship between Q and U. 

Finally, the recent article by Olkin (1990) illustrâtes the strong interplay 
between statistics and linear algebra. The author listed several areas of 
statistics with a strong linear algebra component. 


FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

Anderson, T. W., and S. D. Gupta (1963). “Some inequalities on characteristic roots 
of matrices,” Biometrika, 50, 522-524. 

Anderson, T. W., I. Olkin, and L. G. Underhill (1987). “Génération of random 
orthogonal matrices.” 5i/lM/. Sci. Statist. Comput., 8, 625-629. 

Basilevsky, A. (1983). Applied Matrix Algebra in the Statistical Sciences. North-Holland, 
New York. (This book addresses topics in matrix algebra that are useful in both 
applied and theoretical branches of the statistical sciences.) 

Bellman, R. (1970). Introduction to Matrix Analysis, 2nd ed. McGraw-Hill, New York. 
(An excellent référencé book on matrix algebra. The minimum-maximum char- 
acterization of eigenvalues is discussed in Chap. 7. Kronecker products are 
studied in Chap. 12. Some applications of matrices to stochastic processes and 
probability theory are given in Chap. 14.) 

Belsley, D. A., E. Kuh, and R. E. Welsch (1980). Régression Diagnostics. Wiley, New 
York. (This is a good référencé for learning about multicollinearity in linear 
statistical models that was discussed in Section 2.4.2. Examples are provided 
based on actual econometric data.) 

Bush, K. A., and I. Olkin (1959). “Extrema of quadratic forms with applications to 
statistics.” Biometrika, 46, 483-486. 

Gantmacher, F. R. (1959). The Theory of Matrices, Vols. I and II. Chelsea, New York. 
(These two volumes provide a rather more advanced study of matrix algebra than 
standard introductory texts. Methods to parameterize an orthogonal matrix, 
which were mentioned in Section 2.4.4, are discussed in Vol. I.) 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


51 


Golub, G. H., and C. F. Van Loan (1983). Matrix Computations. Johns Hopkins 
University Press, Baltimore, Maryland. 

Good, I. J. (1969). “Some applications of the singular décomposition of a matrix.” 
Technometrics, 11 , 823-831. 

Graybill, F. A. (1961). An Introduction to Linear Statistical Models, Vol. I. McGraw-Hill, 
New York. (This is considered a classic textbook in experimental statistics. It is 
concerned with the mathematical treatment, using matrix algebra, of linear 
statistical models.) 

Graybill, F. A. (1983). Matrices with Applications in Statistics, 2nd ed. Wadsworth, 
Belmont, California. (This frequently referenced textbook contains a great num- 
ber of theorems in matrix algebra, and describes many properties of matrices that 
are pertinent to linear model and mathematical statistics.) 

Healy, M. J. R. (1986). Matrices for Statistics. Clarendon Press, Oxford, England. (This 
is a short book that provides a brief coverage of some basic concepts in matrix 
algebra. Some applications in statistics are also mentioned.) 

Heiberger, R. M., P. F. Velleman, and M. A. Ypelaar (1983). “Generating test data 
with independently controllable features for multivariate general linear forms.” 
J. Amer. Statist. Assoc., 78 , 585-595. 

Henderson, H. V., F. Pukelsheim, and S. R. Searle (1983). “On the history of the 
Kronecker producV’ Linear and Multilinear Algebra, 14 , 113-120. 

Henderson, H. V., and S. R. Searle (1981). “The vec-permutation matrix, the vec 
operator and Kronecker products: A review.” Linear and Multilinear Algebra, 9, 
271-288. 

Hoerl, A. E., and R. W. Kennard (1970). “Ridge régression: Applications to 
nonorthogonal problems.” Technometrics, 12 , 69-82. 

James, A. T. (1954). “Normal multivariate analysis and the orthogonal group.” Ann. 
Math. Statist., 25 , 40-75. 

Khuri, A. I. (1982). “Direct products: A powerful tool for the analysis of balanced 
data.” Comm. Statist. Theory Methods, 11 , 2903-2920. 

Khuri, A. I. (1984). “Interval estimation of fixed effects and of functions of variance 
components in balanced mixed models.” Sankhyâ, Sériés B, 46 , 10-28. (Section 5 
in this article gives a procedure for the construction of exact simultaneous 
confidence intervals on estimable linear functions of the fixed effects in a 
balanced mixed model.) 

Khuri, A. L, and I. J. Good (1989). “The parameterization of orthogonal matrices: 
A review mainly for statisticians.” South African Statist. J., 23, 231-250. 

Khuri, A. L, and R. H. Myers (1981). “Design related robustness of tests in régression 
models.” Comm. Statist. Theory Methods, 10 , 223-235. 

Lancaster, P. (1969). Theory of Matrices. Academie Press, New York. (This book is 
written primarily for students of applied mathematics, engineering, or science 
who want to acquire a good knowledge of the theory of matrices. Chap. 7 has an 
interesting discussion concerning the behavior of matrix eigenvalues under per- 
turbation of the éléments of the matrix.) 

Lowerre, J. M. (1982). “An introduction to modem matrix methods and statistics.” 
Amer. Statist., 36 , 113-115. (An application of the singular-value décomposition 
is given in Section 2 of this article.) 



52 


BASIC CONCEPTS IN LINEAR ALGEBRA 


Magnus, J. R., and H. Neudecker (1988). Matrix Dijferential Calculus with Applications 
in Statistics and Econometrics. Wiley, New York. (This book consists of six parts. 
Part one deals with the basics of matrix algebra. The remaining parts are devoted 
to the development of matrix differential calculus and its applications to statistics 
and econometrics. Part four has a chapter on inequalities concerning eigenvalues 
that pertains to Section 2.3.10 in this chapter.) 

Mandel, J. (1982). “Use of the singular-value décomposition in régression analysis.” 
Amer. Statist., 36, 15-24. 

Marcus, M., and H. Mine (1988). Introduction to Linear Algebra. Dover, New York. 
(This book présents an introduction to the fundamental concepts of linear 
algebra and matrix theory.) 

Marsaglia, G., and G. P. H. Styan (1974). “Equalities and inequalities for ranks of 
matrices.” Linear and Multilinear Algebra, 2, 269-292. (This is an interesting 
collection of results on ranks of matrices. It includes a wide variety of equalities 
and inequalities for ranks of products, of sums, and of partitioned matrices.) 

May, W. G. (1970). Linear Algebra. Scott, Foresman and Company, Glenview, Illinois. 

Montgomery, D. C., and E. A. Peck (1982). Introduction to Linear Régression Analysis. 
Wiley, New York. (Chap. 8 in this book has an interesting discussion concerning 
multicollinearity. It includes the sources of multicollinearity, its harmful effects in 
régression, available diagnostics, and a survey of remédiai measures. This chapter 
provides useful additional information to the material in Section 2.4.2.) 

Morrison, D. F. (1967). Multivariate Statistical Methods. McGraw-Hill, New York. 
(This book can serve as an introductory text to multivariate analysis.) 

Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York. 
(This book is designed as a text for a graduate-level course in multivariate 
analysis.) 

Myers, R. H. (1990). Classical and Modem Régression with Applications, 2nd ed. 
PWS-Kent, Boston. (Chap. 8 in this book should be useful reading concerning 
multicollinearity and its effects.) 

Newcomb, R. W. (1960). “On the simultaneous diagonalization of two semidefinite 
matrices.” Quart. Appl. Math., 19, 144-146. 

Olkin, I. (1990). “Interface between statistics and linear algebra.” In Matrix Theory 
and Applications, Vol. 40, C. R. Johnson, ed., American Mathematical Society, 
Providence, Rhode Island, pp. 233-256. 

Price, G. B. (1947). “Some identities in the theory of déterminants.” Amer. Math. 
Monthly, 54, 75-90. (Section 10 in this article gives some history of the theory of 
déterminants.) 

Rogers, G. S. (1984). “Kronecker products in ANOVA — a first step.” ylmcr. Statist., 
38, 197-202. 

Searle, S. R. (1971). Linear Models. Wiley, New York. 

Searle, S. R. (1982). Matrix Algebra Useful for Statistics. Wiley, New York. (This is a 
useful book introducing matrix algebra in a manner that is helpful in the 
statistical analysis of data and in statistics in general. Chaps. 13, 14, and 15 
présent applications of matrices in régression and linear models.) 

Seber, G. A. F. (1984). Multivariate Observations. Wiley, New York. (This is a good 
reference on applied multivariate analysis that is suited for a graduate-level 
course.) 



EXERCISES 


53 


Smith, D. E. (1958). History of Mathematics. Vol. I. Dover, New York. (This interest- 
ing book contains, among other things, some history concerning the development 
of the theory of déterminants and matrices.) 

Wolkowicz, H., and G. P. H. Styan (1980). “Bounds for eigenvalues using traces.” 
Linear Algebra AppL, 29, 471-506. 


EXERCISES 
In Mathematics 

2.1. Show that a set of n X 1 vectors, u^, U 2 , . . . ,u^, is always linearly 
dépendent if m> n. 

2.2. Let IP be a vector subspace of V such that lE = L(ui, U 2 , . . . , u„), 
where the u/s (/ = 1, 2, . . . , n) are linearly independent. If v is any 
vector in V that is not in W, then the vectors Ui,U 2 ,...,u„, v are 
linearly independent. 

2.3. Prove Theorem 2.1.3. 

2.4. Prove part 1 of Theorem 2.2.1. 

2.5. Let T: (7 ^ K be a linear transformation. Show that T is one-to-one if 

and only if whenever u^, U 2 , • • • , are linearly independent in U, then 
r(ui), r(u 2 T(u„) are linearly independent in K 

2.6. Let T: represented by an n Xm matrix of rank p. 

(a) Show that dim[r(R„)] = p. 

(b) Show that ii n<m and p = n, then T is one-to-one. 

2.7. Show that tr(A'A) = 0 if and only if A = 0. 

2.8. Let A be a symmetric positive semidefinite matrix of order nXn. Show 
that v'Av = 0 if and only if Av = 0. 

2.9. The matrices A and B are symmetric and positive semidefinite of order 
n Xn such that AB = BA. Show that AB is positive semidefinite. 

2.10. If A is a symmetric nXn matrix, and B is an nXn skew-symmetric 
matrix, then show that tr(AB) = 0. 

2.11. Suppose that tr(PA) = 0 for every skew-symmetric matrix P. Show that 
the matrix A is symmetric. 



54 


BASIC CONCEPTS IN LINEAR ALGEBRA 


2.12. Let A be an 7î X 7î matrix and C be a nonsingular matrix of order n Xn. 
Show that A, C“^AC, and CAC“^ hâve the same set of eigenvalues. 

2.13. Let A be an 7î X 7î symmetric matrix, and let A be an eigenvalue of A of 
multiplicity k. Then A — AI„ has rank n — k. 

2.14. Let A be a nonsingular matrix of order nXn, and let c and d be /r X 1 
vectors. If d'A“^c ^ — 1, then 


(A-^c)(d'A-i) 


(A+cd') ^ =A“i - 


1 + d'A-ic 


This is known as the Sherman-Morrison formula. 


2.15. Show that if A and + V'A are nonsingular, then 


(A+UV') ^ =A“i -A-iu(l* + V'A“iu) Va-\ 

where A is of order n Xn, and U and V are of order n Xk. This resuit 
is known as the Sherman-Morrison-Woodbury formula and is a general- 
ization of the resuit in Exercise 2.14. 

2.16. Prove Theorem 2.3.17. 

2.17. Let A and B be nXn idempotent matrices. Show that A — B is 
idempotent if and only if AB = BA = B. 

2.18. Let A be an orthogonal matrix. What can be said about the eigenvalues 
of A? 


2.19. Let A be a symmetric matrix of order nXn, and let L be a matrix of 
order nXm. Show that 

emin(A)tr(L'L) < tr(L'AL) < e^^(A)tr(L'L) 


2.20. Let A be a nonnegative definite matrix of order nXn, and let L be a 
matrix of order nXm. Show that 

(a) e„i„(L'AL) > e„i„(A)e„,„(L'L), 

(b) e_(L'AL) < e_(A)e_(L'L). 

2.21. Let A and Bb& n Xn symmetric matrices with A nonnegative definite. 
Show that 

emin(B)tr(A) < tr(AB) < e^^(B)tr(A). 



EXERCISES 


55 


2.22. Let A“ be a g-inverse of A. Show that 

(a) A“A is idempotent, 

(b) r(A“) >r(A), 

(c) r(A) = r(A“A). 

In Statistics 

2.23. Let y = 3^2^ • • • ^ .Vn)' t)e a normal random vector MO, Let ÿ 

and be the sample mean and sample variance given by 

3^ = - > 

« i=i 

» 

n 

(a) Show that A is an idempotent matrix of rank n — 1, where A is an 
nXn matrix such that y'Ay = {n — 1)^^. 

(b) What distribution does {n — hâve? 

(c) Show that ÿ and Ay are uncorrelated; then conclude that ÿ and s'^ 
are statistically independent. 

2.24. Consider the one-way classification model 

where /jl and (/ = 1, 2, . . . , a) are unknown parameters and e,y is a 
random error with a zéro mean. Show that 

(a) a^ — ai> is an estimable linear function for ail (i,i' = 

1 , 2 ,. 

(b) fjL is nonestimable. 

2.25. Consider the linear model 

y = xp + e, 

where X is a known matrix of order nXp and rank r (</?), P is an 
unknown parameter vector, and e is a random error vector such that 
E(e) = 0 and Var(e) = 

(a) Show that X(X'X)“X' is an idempotent matrix. 

(b) Let Ly be an unbiased linear estimator of \'p. Show that 

Var(X'p) < Var(Ly), 
where = \'(X'X)“X'y. 

The resuit given in part (b) is known as the Gauss-Markov theorem. 




56 


BASIC CONCEPTS IN LINEAR ALGEBRA 


2.26. Consider the linear model in Exercise 2.25, and suppose that r(X) = p. 
Hoerl and Kennard (1970) introduced an estimator of P called the 
ridge estimator P*: 


p* = (X'X + ÆlJ"'x'y, 

where Æ is a “smalF’fixed number. For an appropriate value of k, p* 
provides improved accuracy in the estimation of P over the least-squares 
estimator p = (X'X)“^X'y. Let X'X = PAP' be the spectral décomposi- 
tion of X'X. Show that p* = PDP'p, where D is a diagonal matrix 
whose ith diagonal element is A^CA^ + Æ), / = 1,2, . . . , p, and where 
Ap A 2 , . . . , A^ are the diagonal éléments of A. 

2.27. Consider the ratio 


(x'Ay)^ 

(x'Bix)(y'B2y) ’ 


where A is a matrix of order mXn and B^B 2 are positive definite of 
orders mXm and nXn, respectively. Show that 

supp^ = e„,ax(Br^AB2"^A'). 
x,y 


[Hint: Define and C 2 as symmetric nonsingular matrices such that 
cl = B^, C 2 = B 2 . Let C^x = U, C 2 y = v. Then can be written as 


(u'CC^ACT^v) 2 

= . ! w = (v'cr^AC2-iTf , 


(u'u)(v'v) 


where v = u/(u'u)^/^, t = v/(v'v)^/^ are unit vectors. Verify the resuit 
of this problem after noting that p^ is now the square of a dot 
product.] 

Note: This exercise has the following application in multivariate analy- 
sis: Let and Z 2 be random vectors with zéro means and variance- 
covariance matrices Su, 2 22 ? respectively. Let 2 12 be the covariance 
matrix of z^ and Z 2 . On choosing A= 2i2,B^ = 2n,B2 = 222? t^e 
positive square root of the supremum of p^ is called the canonical 
corrélation coefficient between z^ and Z 2 . It is a measure of the linear 
association between z^ and Z 2 (see, for example, Seber, 1984, Section 
5.7). 



CHAPTER 3 


Limits and Continuity of Functions 


The notions of limits and continuity of functions lie at the kernel of calculus. 
The general concept of continuity is very old in mathematics. It had its 
inception long ago in ancient Greece. We owe to Aristotle (384-322 B.C.) 
the first known définition of continuity: “A thing is continuous when of any 
two successive parts the limits at which they touch are one and the same and 
are, as the word implies, held together” (see Smith, 1958, page 93). Our 
présent définitions of limits and continuity of functions, however, are sub- 
stantially those given by Augustin-Louis Cauchy (1789-1857). 

In this chapter we introduce the concepts of limits and continuity of 
real-valued functions, and study some of their properties. The domains of 
définition of the functions will be subsets of R, the set of real numbers. 
A typical subset of R will be denoted by D. 


3.1. LIMITS OF A FUNCTION 

Before defining the notion of a limit of a function, let us understand what is 
meant by the notation a, where a and x are éléments in R. If a is finite, 
then x^ a means that x can hâve values that belong to a neighborhood 
N^(a) of a (see Définition 1.6.1) for any r > 0, but xi=a, that is, 0 < |x — a| < 
r. Such a neighborhood is called a deleted neighborhood of a, that is, a 
neighborhood from which the point a has been removed. If a is infinité ( — oo 
or +oo), then x^a indicates that \x\ can get larger and larger without any 
constraint on the extent of its increase. Thus |x| can hâve values greater than 
any positive number. In either case, whether a is finite or infinité, we say that 
X tends to a or approaches a. 

Let us now study the behavior of a function f(x) as x^ a. 

Définition 3.1.1. Suppose that the function f(x) is defined in a deleted 
neighborhood of a point a^R. Then /(x) is said to hâve a limit L as x ^ a 


57 



58 


LIMITS AND CONTINUITY OF FUNCTIONS 


if for every e> 0 there exists a ô > 0 such that 

\f(x)-L\<e (3.1) 

for ail X for which 

0 < |x — a| < ô. (3-2) 

In this case, we write f(x) L as a, which is équivalent to saying that 
lim^^ ^ /(x) =L. Less formally, we say that f(x)^L as x^a if, however 
small the positive number e might be, /(x) differs from L by less than e for 
values of x sufficiently close to a. □ 

Note 3.1.1. When /(x) has a limit as x ^ a, it is considered to be finite. 
If this is not the case, then /(x) is said to hâve an infinité limit ( — or 
as x^a. This limit exists only in the extended real number System, which 
consists of the real number System combined with the two symbols, — and 
+ 00 . In this case, for every positive number M there exists a ô > 0 such that 
|/(x)| > M if 0 < |x — a| < ô. If a is infinité and L is finite, then /(x) ^ L as 
x^ a ii for any 6 > 0 there exists a positive number N such that inequality 
(3.1) is satisfied for ail x for which |x| > N. In case both a and L are infinité, 
then /(x) L as x^ a if for any 5 > 0 there exists a positive number A 
such that |/(x)| >5 if |x| >A. 

Note 3.1.2. If /(x) has a limit L as x ^ a, then L must be unique. To 
show this, suppose that and L 2 are two limits of /(x) as x^ a. Then, for 
any €> 0 there exist > 0, ^2 > 0 such that 


€ 


\f(x) -Li < -, 

if 0 < X — a 

< ^1, 

€ 



\f{x) -L 2 < -, 

if 0 < X — a 

< Ô 2 . 


Hence, if ô = minCô^, § 2 )? then 

|Li — L 2 I = \L^—f{x) +/(x) — L 2 

<|/(x) -Li| + |/(x) -L 2 
< € 

for ail X for which 0 < |x — a| <8. Since |L^ — L 2 I is smaller than 6, which is 
an arbitrary positive number, we must hâve =^2 (why?). 

Note 3.1.3. The limit of /(x) as described in Définition 3.1.1 is actually 
called a two-sided limit. This is because x can approach a from either side. 
There are, however, cases where /(x) can hâve a limit only when x ap- 
proaches a from one side. Such a limit is called a one-sided limit. 



LIMITS OF A FUNCTION 


59 


By définition, if f(x) has a limit as x approaches a from the left, 
symbolically written as x^a~, then f(x) has a left-sided limit, which we 
dénoté by L“. In this case we write 

lim f(x) =L~. 

x^a ~ 

If, however, f(x) has a limit as x approaches a from the right, symbolically 
written as x^ then f(x) has a right-sided limit, denoted by L"^, that is, 

lim f(x) =L'^. 

x^a 

From the above définition it follows that f(x) has a left-sided limit L“ as 
x^ a~ if for every e> 0 there exists a ô > 0 such that 

\f{x) -L~ I < e 

for ail X for which 0 < a — x < 5. Similarly, /(x) has a right-sided limit as 
x^ a'^ if for every e> 0 there exists a 5 > 0 such that 

|/(x) -L+ I < e 

for ail X for which 0 <x — a < ô. 

Obviously, if /(x) has a two-sided limit L as x ^ a, then L~ and both 
exist and are equal to L. Vice versa, if L“ = L^, then /(x) has a two-sided 
limit L as X ^ a, where L is the common value of L“ and (why?). We 
can then State that lim^ ^ ^ /(x) = L if and only if 

lim /(x) = lim /(x) =L. 

x^a ~ x^a 

Thus to détermine if /(x) has a limit as x^ a, we first need to find out if it 
has a left-sided limit L~ and a right-sided limit as x^ a. If this is the 
case and L~ = L^ = L, then /(x) has a limit L as x^a. 

Throughout the remainder of the book, we shall drop the characterization 
“two-sided” when making a reference to a two-sided limit L of /(x). Instead, 
we shall simply State that L is the limit of /(x). 

Example 3.1.1. Consider the function 


f{x) 


(x- l)/(x^- 1), 

4, X = 1. 


This function is defined everywhere except at x = — 1. Let us find its limit as 
X ^ a, where a ^ R. We note that 


X — 1 1 

lim/(x) = lim = lim . 

x^a x-*a X — 1 x—*a X + 1 



60 


LIMITS AND CONTINUITY OF FUNCTIONS 


This is true even if a = 1, because xi=a as x^a. We now daim that if 
a ¥= —1, then 

1 1 

lim = . 

x^a X \ û. \ 

To prove this daim, we need to find a ô > 0 such that for any 6 > 0, 

1 1 

< € 


X + 1 a 1 

if 0 < |x — a| <8. Let us therefore consider the following two cases: 
Case 1. a > — 1. In this case we hâve 


(3.3) 


1 

1 


\x — a 


X + 1 

Cl 1 


X + 1| a 

+ ii 


(3.4) 


lî \x — a\ <8, then 




(3.5) 


Since a + 1 > 0, we can choose ô > 0 such that a — ô + 1 > 0, that is, 
8 <a 1. From (3.4) and (3.5) we then get 


1 


1 


8 


< 


(^ï + l)(^ï — Ô + 1) 


X + 1 a 1 
Let us constrain 8 even further by requiring that 

< 6. 


+ l)(^ï — ^"1" 1) 
This is accomplished by choosing ô > 0 so that 

(a + lŸ € 

1 + ( + 1 ) 6 


8< 


Since 


(a + 1) € 


< + 1 , 


1 + ( + 1) c 

inequality (3.3) will be satisfied by ail x for which \x — a\ <8, where 

(a + 1)^6 

1 + ( + 1) e 


0<Ô< 


(3.6) 



LIMITS OF A FUNCTION 


61 


Case 2. a < —1. Here, we choose ô> 0 such that a + ô + 1 < 0, that is, 
ô< — (fl + 1). From (3.5) we conclude that 


X+l| > — (fl + ô+1). 


Hence, from (3.4) we get 


X+1 fl + 1 (fl + l)(fl + Ô+ l) 

As before, we further constrain 8 by requiring that it satisfy the inequality 


(fl + l)(fl H" 1) 


< 


or equivalently, the inequality 


ô< 


(fl + 1 )€ 

1 — (fl + 1) 6 


Note that 


(fl+1) € 

1 — (fl + 1) 6 


< — (fl + 1) 


Consequently, inequality (3.3) can be satisfied by choosing 8 such that 


0<Ô< 


(fl+1) € 

1 — (fl + 1) € 


(3.7) 


Cases 1 and 2 can be combined by rewriting (3.6) and (3.7) using the single 
double inequality 


0 < 8 < 


I fl + 1 1 

1 + fl + 1 6 


If fl = — 1, then no limit exists as x ^ fl. This is because 


lim f(x) = lim 

— 1 — 1 X + 1 



62 


LIMITS AND CONTINUITY OF FUNCTIONS 


If X — 1 , then 


1 

lim 

x^-l- X+ 1 


= — 00 


and, as x — 1^, 


1 

lim 

— 1 X + 1 


= 00 


Since the left-sided and right-sided limits are not equal, no limit exists as 
x^ —1. 

Example 3.1.2. Let /(x) be defined as 


+ X>0, 

Ix, x<0. 


This function has no limit as x ^ 0, since 


lim /(x)= lim x = 0, 

x^0~ x^0~ 

lim /(x) = lim (1 + V^) = 1. 


However, for any a ^0, lim^^ ^ /(x) exists. 
Example 3.1.3. Let /(x) be given by 


.. ^ fxcosx, x^O, 



Figure 3.1. The graph of the function f(x). 


0 


X 


20 



SOME PROPERTIES ASSOCIATED WITH LIMITS OF FUNCTIONS 


63 


Then lim^^ q /( x) = 0. This is true because |/(x)| < |x| in any deleted 
neighborhood of a = 0. As x ^ ^,f(x) oscillâtes unboundedly, since 

— x<xcosx<x. 

Thus /(x) bas no limit as x ^ A similar conclusion can be reached when 
x^ —00 (see Figure 3.1). 


3.2. SOME PROPERTIES ASSOCIATED WITH LIMITS OF FUNCTIONS 

The following theorems give some fundamental properties associated with 
function limits. 

Theorem 3.2.1. Let /(x) and g(x) be real-valued functions defined on 
D <zR. Suppose that lim^ ^ ^ /(x) = L and lim^ ^ ^ g(x) = M. Then 

1 . lim^^ J/(x) +g(x)] =L +M, 

2. lim^^ J/(x)g(x)] = LM, 

3. lim^^ Jl/g(x)]= 1/M if M^O, 

4 . lim^^ J/(x)/g(x)] = L/M if M^ 0. 

Proof We shall only prove parts 2 and 3. The proof of part 1 is straight- 
forward, and part 4 results from applying parts 2 and 3. 

Proof of Part 2. Consider the following three cases: 

Case 1. Both L and M are finite. Let e > 0 be given and let r> 0 be 
such that 


t(t+ |L| + |M| ) < c. 


This inequality is satisfied by ail values of r for which 


(3.8) 


0 < r < 


-(|L| +|M|) + V^(|L| +|M|)" + 46 

2 


Now, there exist Si > «2 > 0 such that 


\f{x) 

-L 

< T 

if 0< 

X — a 

l^(^) 

-M 

< T 

if 0< 

X — a 


< ^ 1 , 
< ^2 . 



64 


LIMITS AND CONTINUITY OF FUNCTIONS 


Then, for any x such that 0 <\x — a\ <8 where 8 = minCâ^, 82 X 
\f{x)g{x) -LM\ = \M[f{x) -L] +f{x)[g{x) -M] 

< t\M\ + t|/(x) I 

<t\M\ +r[|L| +|/(x)-L|] 

< t(t+ |L| + |M|) 

< 

which proves part 2. 


Case 2. One of L and M is finite and the other is infinité. Without any 
loss of generality, we assume that L is finite and M = ao. Let us also assume 
that L 0, since is indeterminate. Let > 0 be given. There exists a 
> 0 such that |/(x)| > |L| /2 if 0 < |x — a| <8^ (why?). Also, there exists 
a ^2 > 0 such that |g(x)| > 2A /\L\ if 0 < |x — a| < 82 ^ Let 8 = minCô^, Ô 2 ). 
If 0 < |x — a| < 8 , then 


\f{x)g{x)\ = \f{x)\\g{x) 



2 



A =A. 


This means that lim^^ ^ f(x)g(x) = which proves part 2. 


Case 3. Both L and M are infinité. Suppose that L = M= co. In this 
case, for a given 5 > 0 there exist /c^ > 0, /C 2 > 0 such that 


Then, 


|/(x) I > if 0 < |x — a| < 

|g(x) I > y[B if 0 < |x — a| < K 2 . 

\f(x)g{x)\> B, if 0 < |x — a| < K, 


where k = min(K^, ^ 2 ). This implies that lim^ ^ ^ f(x)g(x) = which proves 
part 2. 


Proof of Part 3. Let e > 0 be given. If M ^0, then there exists a > 0 
such that |g(x)| > \M\ /2 if 0 < |x — a| < Also, there exists a A 2 > 0 such 
that |g(x) —M\< eM^/2 if 0 < |x — a| < A 2 . Then, 

\g(x)-M\ 
g{x) M \g{x)\\M\ 

2\g{x)-M\ 

^ \M\^ 

< 

if 0 < |x — a| < A, where A = minCA^, A 2 ). □ 



THE O, O NOTATION 


65 


Theorem 3.2.1 is also true if L and M are one-sided limits of /(x) and 
g(x), respectively. 

Theorem 3.2.2. If /(x) <g(x), then lim^^ ^ /(x) < lim^^ ^ ^(^)- 

Proof Let /(x) g(x) =M. Suppose that L— M>0. 

By Theorem 3.2.1, L— M is the limit of the function h{x) =f(x) — g(x). 
Therefore, there exists a ô > 0 such that 

L -M 

\h{x)-{L-M)\<^—, (3.9) 

if 0 < |x — a| < ô. Inequality (3.9) implies that h{x) > (L — M)/2 > 0, which 
is not possible, since, by assumption, h(x)=f(x)—g(x) <0. We must then 
hâve L — M < 0. □ 


3.3. THE O, O NOTATION 

These symbols provide a convenient way to describe the limiting behavior of 
a function /(x) as x tends to a certain limit. 

Let /(x) and g(x) be two functions defined on D <zR. The function g(x) 
is positive and usually has a simple form such as 1, x, or 1/x. Suppose that 
there exists a positive number K such that 

|/(x)|<i^(x) 

for ail x^E, where E <zD. Then, /(x) is said to be of an order of magnitude 
not exceeding that of g(x). This fact is denoted by writing 

f{x)=0{g(x)) 

for ail x^E. In particular, if g(x) = 1, then /(x) is necessarily a bounded 
function on E. For example, 

cos X = 0(1) for ail x, 

X = O(x^) for large values of x, 

x^ + 2x = O(x^) for large values of x, 

sin X = 0( |x| ) for ail x. 

The last relationship is true because 

sin X 

< 1 

X 

for ail values of x, where x is measured in radians. 



66 


LIMITS AND CONTINUITY OF FUNCTIONS 


Let US now suppose that the relationship between f(x) and g(x) is such 
that 

lim = 0. 


x—*a 


gi^) 


Then we say that fix) is of a smaller order of magnitude than g(x) in a 
deleted neighborhood of a. This fact is denoted by writing 


f{x) =o{g{x)) as x^a, 

which is équivalent to saying that f(x) tends to zéro more rapidly than g(x) 
as X ^ a. The o symbol can also be used when x tends to infinity. In this 
case we write 


f{x)=o{g{x)) forx>A, 
where A is some positive number. For example, 

= c'(x) as X ^ 0, 
tan x^ = o(x^) as X ^ 0, 

Vx = o(x) as X ^ 00 . 

If /(x) and g(x) are any two functions such that 

f(x) 

——— 1 as X ^ fl, 

six) 

then /(x) and g(x) are said to be asymptotically equal, written symbolically 
/(x) ^g(x), as X ^ fl. For example, 

x^ ^x^ + 3x + 1 as X ^ 00 , 
sin X X as X ^ 0 . 

On the basis of the above définitions, the following properties can be 
deduced: 

1 . 0(f(x) + g(x)) = 0(f(x)) + 0(g(x)). 

2. 0( fix)g(x)) = 0( f(x))0(g(x)). 

3. o(f(x)g(x)) = 0(f(x))o(g(x)). 

4. If /(x) ~ g(x) as X fl, then /(x) =g(x) + o(g(x)) as x ^ fl. 

3.4. CONTINUOUS FUNCTIONS 

A function /(x) may hâve a limit L as x ^ fl. This limit, however, may not be 
equal to the value of the function at x = fl. In fact, the function may not even 



CONTINUOUS FUNCTIONS 


67 


be defined at this point. If f(x) is defined at x = a and L =f(a), then f(x) is 
said to be continuons at x = a. 


Définition 3.4.1. Let /: D ^ R, where D (zR, and let a^D. Then /(x) 
is continuons at x = a if for every 6 > 0 there exists a ô > 0 such that 

|/(x) -f{a)\< € 

for ail x^D for which \x — a\ <8. 

It is important here to note that in order for /(x) to be continuons at 
x = a, it is necessary that it be defined at x = a as well as at ail other points 
inside a neighborhood Nfa) of the point a for some r > 0. Thus to show 
continuity of /(x) at x = a, the following conditions must be verified: 

1 . /(x) is defined at ail points inside a neighborhood of the point a. 

2. /(x) has a limit from the left and a limit from the right as x^ a, and 
that these two limits are equal to L. 

3. The value of /(x) at x = a is equal to L. 

For convenience, we shall dénoté the left-sided and right-sided limits of 
/(x) as x^ ahy f(a~) and respectively. 

If any of the above conditions is violated, then /(x) is said to be 
discontinuons at x = a. There are two kinds of discontinuity. □ 

Définition 3.4.2. A function f:D^R has a discontinuity of the first kind 
at x = a if f(a~) and f(a'^) exist, but at least one of them is different from 
f(a). The function /(x) has a discontinuity of the second kind at the same 
point if at least one of f(a~) and f(a'^) does not exist. □ 


Définition 3.4.3. A function /: D ^R is continuons on E <zD if it is 
continuons at every point of E. 

For example, the function 


f(x) 


f 




\ 


X — I 

1 

2 ’ 


X > 0, X # I, 
X = I, 


is defined for ail x > 0 and is continuons at x = I. This is true because, as 
was shown in Example 3.I.I, 


lim/(x) 


X 


I 

lim 

x^l X + 1 


1 

2 


? 


which is equal to the value of the function at x = I. Furthermore, /(x) is 



68 


LIMITS AND CONTINUITY OF FUNCTIONS 


continuous at ail other points of its domain. Note that if /(l) were different 
from then f(x) would hâve a discontinuity of the first kind at x = 1. 

Let us now consider the function 


[x + 1, X > 0, 

/(x)= 0, x = 0, 

(x — 1, X < 0. 

This function is continuous everywhere except at x = 0, since it has no limit 
as X ^ 0 by the fact that /(0“) = — 1 and /(O^) = 1. The discontinuity at this 
point is therefore of the first kind. 

An example of a discontinuity of the second kind is given by the function 


f(x) 


( 1 
I cos — , 

lo/ 


X =5^ 0, 
X = 0, 


which has a discontinuity of the second kind at x = 0, since neither /(O ) nor 
/(O"^) exists. 


Définition 3.4.4. A function /: D is left-continuous at x = a if 
lim^^ /(x) =f(a). It is right-continuous at x = a if lim^^ /(x) =f(a). 

□ 


Obviously, a left-continuous or a right-continuous function is not necessar- 
ily continuous. In order for /(x) to be continuous at x = a it is necessary and 
sufficient that /(x) be both left-continuous and right-continuous at this point. 
For example, the function 


f(x) 


X — 1, X < 0, 

1, X > 0 


is left-continuous at x = 0, since /(0“) = — 1 =/(0). If /(x) were defined so 
that /(x) = x — 1 for x < 0 and /(x) = 1 for x > 0, then it would be right-con- 
tinuous at X = 0. 


Définition 3.4.5. The function f:D^Ris uniformly continuous on E c 
D if for every c > 0 there exists a ô > 0 such that 

|/(xi) -/(X 2 )| < e (3.10) 

for ail Xi, X 2 ^E for which Ix^ — X 2 I <8. □ 

This définition appears to be identical to the définition of continuity. That 
is not exactly the case. Uniform continuity is always associated with a set such 



CONTINUOUS FUNCTIONS 


69 


as E in Définition 3.4.5, whereas continuity can be defined at a single point 
a. Furthermore, inequality (3.10) is true for ail pairs of points X 2 ^E such 
that Ix^— X 2 I < 8 . Hence, 8 dépends only on e, not on the particular 
locations of Xi,X 2 . On the other hand, in the définition of continuity 
(Définition 3.4.1) 8 dépends on e as well as on the location of the point 
where continuity is considered. In other words, 8 can change from one point 
to another for the same given e > 0. If, however, for a given 6 > 0, the same 8 
can be used with ail points in some set E<zD, then /(x) is uniformly 
continuons on E. For this reason, whenever /(x) is uniformly continuons on 
E, 8 can be described as being “portable,” which means it can be used 
everywhere inside E provided that e > 0 remains unchanged. 

Obviously, if /(x) is uniformly continuons on E, then it is continuons 
there. The converse, however, is not true. For example, consider the function 
/: (0, l)^i^ given by /(x) = 1/x. Here, /(x) is continuons at ail points of 
E = (0, 1), but is not uniformly continuons there. To demonstrate this fact, let 
us first show that /(x) is continuons on E. Let 6 > 0 be given and let a^E. 
Since a > 0, there exists a Si>0 such that the neighborhood N^^a) is a 
subset of E. This can be accomplished by choosing 8 ^ such that 0 < < a. 

Now, for ail x^Ng^a), 


1 

1 


X — a 

X 

a 


ax 


< 


a{a — Ô^) 


Let 82 > 0 be such that for the given 6 > 0, 

a{a — 82 ) 

which can be satisfied by requiring that 


0 < Ô 2 < . 

\ U€ 


Since 


a^€ 

< a 

1 ü€ 


then 


1 1 

X a 


< € 


(3.11) 



70 


LIMITS AND CONTINUITY OF FUNCTIONS 


it\x — a\ <8, where 


/ a^€ \ a\ 

8 < min a , = . 

\ 1 ü€ j 1 ae 

It follows that f(x) = 1/x is continuons at every point of E. We note here 
the dependence of 8 on both € and a. 

Let us now demonstrate that f(x)= 1/x is not uniformly continuons on 
E. Define G to be the set 





l-\- ae 


a^E 


In order for f{x) = 1/x to be uniformly continuons on E, the infimum of 
G must be positive. If this is possible, then for a given 6>0, (3.11) will be 
satisfied by ail x for which 



< inf(G), 


and for ail a e (0, 1). However, this cannot happen, since inf(G) = 0. Thus it 
is not possible to find a single 8 for which (3.11) will work for ail a e (0, 1). 

Let us now try another function defined on the same set E = (0, 1), 
namely, the function /(x)=x^. In this case, for a given 6>0, let ô>0 be 
such that 

8^ + 28a-€<{l. (3.12) 


Then, for any a^E/\i x — a 


< ô we get 


x^ — a^\ =\x — a\ \x + a\ 

= \x — a\ \x — a 2a 
< ô( ô + 2a) < €. 


(3.13) 


It is easy to see that this inequality is satisfied by ail ô > 0 for which 


If H is the set 


0 < ô < —a + '\la^ + € . 


(3.14) 




€ a 



then it can be verified that inf(//) = — 1 + /TTT. Hence, by choosing 8 
such that 


ô<inf(//). 


inequality (3.14), and hence (3.13), will be satisfied for ail a^E. The 
function /(x) =x^ is therefore uniformly continuons on E. 



CONTINUOUS FUNCTIONS 


71 


The above examples demonstrate that continuity and uniform continuity 
on a set E are not always équivalent. They are, however, équivalent under 
certain conditions on E. This will be illustrated in Theorem 3.4.6 in the next 
subsection. 

3.4.1. Some Properties of Continuons Functions 

Continuons functions hâve some interesting properties, some of which are 
given in the following theorems: 

Theorem 3.4.1. Let f(x) and g(x) be two continuons functions defined 
on a set D <zR. Then: 

1. f(x) + g(x) and f(x)g(x) are continuons on D. 

2. af(x) is continuons on D, where a is a constant. 

3. f(x)/g(x) is continuons on D provided that g(x) ^ 0 on Z). 

Proof The proof is left as an exercise. □ 

Theorem 3.4.2. Suppose that /: D ^ R is continuons on Z), and 
g\ f{D)^R is continuons on /(T>), the image of D under /. Then the 
composite fonction h: D ^R defined as h(x) =g[f(x)] is continuons on D. 

Proof. Let e>0 be given, and let a^D. Since g is continuons at f(a), 
there exists a ô'>0 such that \g[f(x)]— g[f(a)]\ < e ii \f(x)—f(a)\ <ô\ 
Since f(x) is continuons at x = a, there exists a ô > 0 such that \f(x) —f(a)\ 
<ô' if |x — a| <ô. It follows that by taking |x — a| <ô we must hâve 
\h(x) — h(a)\ <6. □ 

Theorem 3.4.3. If f(x) is continuons at x = a and f(a) > 0, then there 
exists a neighborhood Ng(a) in which f(x) > 0. 

Proof Since f{x) is continuons at x = a, there exists a ô > 0 such that 

|/(x) -/(fl)|<|/(fl), 
if \x — a\ <8. This implies that 

f(x) > è/(fl) >0 

for ail x^Ng(a). □ 

Theorem 3.4.4 (The Intermediate- Value Theorem). Let /: D^R be 
continuons, and let [a, b] be a closed interval contained in D. Suppose that 



72 


LIMITS AND CONTINUITY OF FUNCTIONS 


f(a) <f(b). If A is a number such that f(a) < A <f(b), then there exists a 
point c, where a <c <b, such that A =/(c). 

Proof. Let g: D^R be defined as g(x)=/(x) — A. This function is 
continuous and is such that g{d) < 0, g{b) > 0. Consider the set 

S = {x ^[a,b]\g(x) <0}. 

This set is nonempty, since a and is bounded from above by b. Hence, by 
Theorem 1.5.1 the least upper bound of S exists. Let c = lub(5). Since 
S c [a, b], then c e [a, b]. 

Now, for every positive integer n, there exists a point e 5 such that 

1 

c <x„ < c. 

n 

Otherwise, if x < c — l/n for ail x e 5, then c — 1/n will be an upper bound 
of 5, contrary to the définition of c. Consequently, lim„^^ x„ = c. Since g(x) 
is continuous on [a, b\ then 

g(c) = lim g(x„) <0, (3.15) 


by Theorem 3.2.2 and the fact that g(x„) < 0. From (3.15) we conclude that 
c <b, since g{b) > 0. 

Let us suppose that g(c) < 0. Then, by Theorem 3.4.3, there exists a 
neighborhood Ng(c), for some 5 > 0, such that g(x) < 0 for ail x^Ng(c) n 
[a, b]. Consequently, there exists a point Xq ^ [a, b] such that c <Xq < c 8 
and gCxg) < 0. This means that Xq belongs to S and is greater than c, 
a contradiction. Therefore, by inequality (3.15) we must hâve g(c) = 0, that 
is, /(c) = A. We note that c > a, since c>a, but c¥=a. This last is true 
because if a =c, then g(c) < 0, a contradiction. This complétés the proof of 
the theorem. □ 

The direct implication of the intermediate-value theorem is that a continu- 
ous function possesses the property of assuming at least once every value 
between any two distinct values taken inside its domain. 

Theorem 3.4.5. Suppose that /: D ^ R is continuous and that D is 
closed and bounded. Then /(x) is bounded in D. 

Proof. Let a be the greatest lower bound of D, which exists because D is 
bounded. Since D is closed, then a^D (why?). Furthermore, since /(x) 



CONTINUOUS FUNCTIONS 


73 


is continuous, then for a given e> 0 there exists a > 0 such that 

f{a) -€<f{x) <f(a) + 6 

ii\x — a\ < 8 ^. The function fix) is therefore bounded in Ng^a). Define j/ to 
be the set 


{x ^D\f(x) is bounded} . 

This set is nonempty and bounded, and Ng^a) D D cj/. We need to show 
that D —s/ is an empty set. 

As before, the least upper bound of s/ exists (since it is bounded) and 
belongs to D (since D is closed). Let c = lub(j^). By the continuity of /(x), 
there exists a neighborhood Ng^c) in which f(x) is bounded for some ^2 > 0. 
If D is nonempty, then Ng^c) D (D —ssf) is also nonempty [if Ng^c) 
then c ( 82 / 2 ) a contradiction]. Let XQ^Ng^c) n(D Then, on 
one hand, /(xq) is not bounded, since Xq e (D —sf). On the other hand, 
/(xq) must be bounded, since Xq ^Ng^c). This contradiction leads us to con- 
clude that D —sf must be empty and that /(x) is bounded in D. □ 


Corollary 3.4.1. If /: D is continuous, where D is closed and 
bounded, then /(x) achieves its infimum and supremum at least once in D, 
that is, there exists such that 


Equivalently, 


f{è)<f{x) forallxeZ), 

f{r]) >/(x) for ail x eZ). 


f{è) = inf /(x), 

x^D 


fiv) = sup/(x). 

x^D 


Proof. By Theorem 3.4.5, f{D) is a bounded set. Hence, its least upper 
bound exists. Let M= lub/(Z)), which is the same as f(x). If there 

exists no point x in D for which /(x) = M, then M — /(x) > 0 for ail x^D. 
Consequently, l/[M—f(x)] is continuous on D by Theorem 3.4.1, and is 
hence bounded there by Theorem 3.4.5. 

Now, if ô > 0 is any given positive number, we can find a value x for which 
/(x) >M — ô, or 

I I 

M-/(x) ^ ô* 


This implies that I/[M— /(x)] is not bounded, a contradiction. Therefore, 
there must exist a point 77 eZ) at which /(?]) =M. 



74 


LIMITS AND CONTINUITY OF FUNCTIONS 


The proof concerning the existence of a point such that /( ^ ) = 

inf^ ^ ^ f{x) is similar. □ 

The requirement that D be closed in Corollary 3.4.1 is essential. For 
example, the function f{x) = 2x — 1, which is defined on Z) = {x|0 <x < 1}, 
cannot achieve its infimum, namely — 1, in D. For if there exists a such 
that /( ^ ) < 2x — 1 for ail x^D, then there exists a ô > 0 such that 0 < ^ — 5. 
Hence, 

a contradiction. 


Theorem 3.4.6. Let /: D^R be continuons on D. If D is closed and 
bounded, then / is uniformly continuons on D. 


Proof. Suppose that / is not uniformly continuons. Then, by using the 
logical négation of the statement concerning uniform continuity in Définition 
3.4.5, we may conclude that there exists an c > 0 such that for every 5 > 0, we 
can find with \x^ “^ 2 ! ^ ^ which \f(xf) —f(x 2 )\ > €. On this 

basis, by choosing ô = 1, we can find with \u^ — v^\ < 1 for which 

>6. Similarly, we can find U 2 ,V 2 ^D with 1^2“ ^^21 for 
which 1 /(^ 2 ) ^ continuing in this process we can find 

with \Uf^ — vJ <l/n for which > 6, tî = 3, 4, . . . . 

Now, let S be the set 




This set is bounded, since 5 cD. Hence, its least upper bound exists. Let 
c = lub(5). Since D is closed, then c ^D. Thus, as in the proof of Theorem 
3.4.4, we can find points , . . . , , . . . in 5 such that lim^ = c. 

Since f{x) is continuons, there exists a 5' > 0 such that 

\f(x) -f{c)\<-, 


if \x — c\ < 8' for any given e> 0. Let us next choose k large enough such 
that if > N, where N is some large positive number, then 




8 ' 

< — and 


1 8 ^ 
— < — 


n 


(3.16) 


Since < l//r^, then 


— c 

< 


+ 

— c 

rik 


Hk tlk 


Hk 


1 8 ' 

— + — < ô' 

Uk 2 


< 


(3.17) 



CONTINUOUS FUNCTIONS 


75 


for rii^>N. From (3.16) and (3.17) and the continuity of f(x) we conclude 
that 

|/(««J-/(c)|<- and |/(y„J-/(c)|<-. 

Thus, 

!/(««,) -fi%) I ^ |/(«nj -/(c) I + !/(««,) -/(c) I 

<e. (3.18) 

However, as was seen earlier, 

|/(m„) -/(«„)!> e, 

hence, 

which contradicts (3.18). This leads us to assert that f(x) is uniformly 
continuons on D. □ 


3.4.2. Lipschitz Continuons Fonctions 

Lipschitz continuity is a specialized form of uniform continuity. 

Définition 3.4.6. The function /: D is said to satisfy the Lipschitz 
condition on a set E <zD if there exist constants, K and a, where K> 0 and 
0 < a < 1 such that 


l/(^l) -fiX2)\<K\x^-X2\“ 

for ail Xp X 2 ^E. □ 

Notationally, whenever f(x) satisfies the Lipschitz condition with con- 
stants K and a on ^ set E, we say that it is hip(K, a) on E. In this case, 
fix) is called a Lipschitz continuons function. It is easy to see that a Lipschitz 
continuons function on E is also uniformly continuons there. 

As an example of a Lipschitz continuons function, consider /(x) = Vx , 
where x > 0. We daim that Vx is Lip(l,|) on its domain. To show this, we 
first Write 



76 


LIMITS AND CONTINUITY OF FUNCTIONS 


Hence, 


Thus, 





< 


X 



1/2 


? 


which proves our daim. 


3.5. INVERSE FUNCTIONS 

From Chapter 1 we recall that one of the basic characteristics of a function 
y =f(x) is that two values of y are equal if they correspond to the same value 
of X. If we were to reverse the rôles of x and y so that two values of x are 
equal whenever they correspond to the same value of y, then x becomes a 
function of y. Such a function is called the inverse function of / and is 
denoted by We conclude that the inverse of /: D exists if and only 
if / is one-to-one. 

Définition 3.5.1. Let /: D ^R. If there exists a function f(D) -^D 
such that f~^[f(x)]=x and ail x^D and f[f~^(y)]=y for ail y^f(D), 
then is called the inverse function of /. □ 

Définition 3.5.2. Let /: D ^R. Then, / is said to be monotone increas- 
ing [decreasing] on D if whenever x^,X 2 ^D are such that x^<X 2 , 
then fixf) <f(x 2 ) [/(x^) >/(x 2 )]. The function / is strictly monotone in- 
creasing [decreasing] on D if /(x^ ) </(^2) [/(^l) ^(x 2 )] ^vhenever x^ '^X 2 * 

□ 

If / is either monotone increasing or monotone decreasing on D, then it is 
called a monotone function on D. In particular, if it is either strictly 
monotone increasing or strictly monotone decreasing, then /(x) is strictly 
monotone on D. 

Strictly monotone fonctions hâve the property that their inverse fonctions 
exist. This will be shown in the next theorem. 

Theorem 3.5.1. Let /: D ^R be strictly monotone increasing (or de- 
creasing) on D. Then, there exists a unique inverse function which is 
strictly monotone increasing (or decreasing) on f(D). 

Proof Let us suppose that / is strictly monotone increasing on D. To 
show that f~^ exists as a strictly monotone increasing function on f{D). 



INVERSE FUNCTIONS 


77 


Suppose that Xi,% 2 ^D are such that /(x^) =/(x 2 ) =y. If #^ 2 , then 
Xi <X 2 or X 2 <Xi. Since / is strictly monotone increasing, then f(x^) <f(x 2 ) 
or /(X 2 ) <f(xi). In either case, /(x^) #/(x 2 ), which contradicts the assump- 
tion that /(xi)=/(x 2 ). Hence, x^=X 2 , that is, / is one-to-one and has 
therefore a unique inverse 

The inverse is strictly monotone increasing on f(D). To show this, 
suppose that /(x^) </(x 2 ). Then, x^ <X 2 * If not, we must hâve x^ >X 2 * In 
this case, /(x^) =/(x 2 ) when x^ =X 2 , or /(x^) >/(x 2 ) when x^ >X 2 , since / 
is strictly monotone increasing. However, this is contrary to the assumption 
that /(x^) </(x 2 ). Thus x^ <X 2 and f~^ is strictly monotone increasing. 

The proof of Theorem 3.5.1 when “increasing” is replaced with “decreas- 
ing” is similar. □ 

Theorem 3.5.2. Suppose that f:D^Ris continuous and strictly mono- 
tone increasing (decreasing) on [a,b]<zD. Then, is continuous and 
strictly monotone increasing (decreasing) on f([a, b]). 

Proof. By Theorem 3.5.1 we only need to show the continuity of 
Suppose that / is strictly monotone increasing. The proof when / is strictly 
monotone decreasing is similar. 

Since / is continuous on a closed and bounded interval, then by Corollary 
3.4.1 it must achieve its infimum and supremum on [a, b]. Furthermore, 
because / is strictly monotone increasing, its infimum and supremum must 
be attained at only a and b, respectively. Thus 

f{[a,b]) = [f{a),f{b)]. 

Let d ^[f(a), f(b)]. There exists a unique value c, a<c<b, such that 
f(c) = d. For any 6 > 0, let r be defined as 

T=min[/(c) -/(c- e),/(c + e) -/(c)]. 

Then there exists a ô, 0 < ô < r, such that ail the x’s in [a, b] that satisfy the 
inequality 


|/(x) -d\< 8 


must also satisfy the inequality 



< €. 


This is true because 

/(c) -/(c) +/(c- e) <d-8<f{x) <d+ 8 

</(c) +/(c + e) -/(c), 



78 


LIMITS AND CONTINUITY OF FUNCTIONS 


that is, 

f{c-€)<f{x) </(c+6). 

Using the fact that is strictly monotone increasing (by Theorem 3.5.1), 
we conclude that 

C — €<X < C + 6, 

that is, \x — c\ < €. It follows that x=f~^(y) is continuons on [f(a),f(b)]. 

□ 


Note that in general if y =/(x), the équation y —f(x) = 0 may not 
produce a unique solution for x in terms of y. If, however, the domain of / 
can be partitioned into subdomains on each of which / is strictly monotone, 
then / can hâve an inverse on each of these subdomains. 

Example 3.5.1. Consider the function f: R^R defined by y =f(x) =x^. 
It is easy to see that / is strictly monotone increasing for ail x^R. It 
therefore has a unique inverse given by f~^(y) =y^^^. 

Example 3.5.2. Let /: [ — 1, 1] be such that y =f(x) =x^ — x. From 
Figure 3.2 it can be seen that / is strictly monotone increasing on = 
[-1,-5“^'^'^] and D 2 = [5~^^'^ ,1], but is strictly monotone decreasing on 
Z>3 = [ — This function has therefore three inverses, one on 
each of D 2 , and By Theorem 3.5.2, ail three inverses are continuons. 

Example 3.5.3. Let /: i^^[ — 1, 1] be the function y=/(x) = sinx, 
where x is measured in radians. There is no unique inverse on R, since the 
sine function is not strictly monotone there. If, however, we restrict the 
domain of / to Z) = [— 77/2, 77/2], then / is strictly monotone increasing 
there and has the unique inverse /~Hy) = Arcsin y (see Example 1.3.4). The 
inverse of / on [77/2,377/2] is given by f~^(y) = tt — Arcsin y. We can 
similarly find the inverse of / on [3t7/2, 5t7/2], [5t7/2, 7t7/2], etc. 





CONVEX FUNCTIONS 


79 


3.6. CONVEX FUNCTIONS 

Convex functions are frequently used in operations research. They also 
happen to be continuons, as will be shown in this section. The natural 
domains for such functions are convex sets. 

Définition 3.6.1. A set D <zR is convex if Xx^ + (1 — A )%2 ^ D whenever 
%2 belong to D and 0 < A < 1. Geometrically, a convex set contains the 
line segment connecting any two of its points. The same définition actually 
applies to convex sets in the n-dimensional Euclidean space (n > 2). For 
example, each of the following sets is convex: 

1 . Any interval in R. 

2. Any sphere in R^, and in general, any hypersphere in R^, n>4. 

3. The set {(x, y) ^R^\ |x| + |y| < 1}. See Figure 3.3. □ 

Définition 3.6.2. A function f:D^Ris convex if 

/[Axi + (1 - A)x 2] < A/(xj) + (1 - A)/(x2) (3.19) 

for ail Xi, X 2 ^D and any A such that 0 < A < 1. The function / is strictly 
convex if inequality (3.19) is strict for ^X 2 * 

Geometrically, inequality (3.19) means that if P and Q are any two points 
on the graph of y =/(x), then the portion of the graph between P and Q lies 
below the chord PQ (see Figure 3.4). Examples of convex functions include 
/(x)=x^ on R, /(x) = sinx on [7r,27r],/(x) = on R, /(x) = -logx for 
X > 0, to name just a few. □ 

Définition 3.6.3. A function f:D^Ris concave if — / is convex. □ 

We note that if /: [a, b]^R is convex and the values of / at a and b are 
finite, then /(x) is bounded from above on [a, b] by M = m3x{f(a), f(b)}. 
This is true because if x e [a, b], then x= Xa -\-(l — X)b for some A e [0, 1], 



Figure 3.3. The SQt {(x,y)^R^\ |x| +|>'| < 1}. 



80 


LIMITS AND CONTINUITY OF FUNCTIONS 


Figure 3.4. The graph of a convex function. 



since [a, b] is a convex set. Hence, 


f(x) < kf{a) + (1 - A)/(fe) 

<AM+(1-A)M = M. 

The function f{x) is also bounded from below. To show this, we first note 
that any x^{a,h] can be written as 

a -\-h 

where 


a -\-h a -\-b 

a <t <b 


Now, 


/ 


a + b 




1 ( a -\-b ^ 

h t 


1 i a -\-b ^ 

+ t/I— — t 

T 


(3.20) 


since if (a-^b)/2-^t belongs to [a, b], then so does (a b)/2 — t. From 
(3.20) we then hâve 


Since 


then 



a -\-b 
2 


\ 

+ 1 



a -\-b\ 




a -\-b 
2 



( a b 



<M, 



a + b 
2 


\ 

+ 1 



a -\-b\ 






CONVEX FUNCTIONS 


81 


that is, f(x) > m for ail x^[a, h\ where 


m = 2f 


a + b 


-M. 


Another interesting property of convex functions is given in Theorem 


3.6.1. 


Theorem 3.6.1. Let /: Z> ^ be a convex function, where D is an open 
interval. Then / is Lip(i^, 1) on any closed interval [a, b] contained in D, 
that is, 


|/(^l) -f{X2)\<K\x^-X. 


(3.21) 


for ail Xp %2 ^ [a, b]. 


Proof Consider the closed interval {a — e,b e], where e > 0 is chosen so 
that this interval is contained in Z). Let m' and M' be, respectively, the 
lower and upper bounds of / (as was seen earlier) on [a — e,b e]. Let 
x^, %2 be any two distinct points in [a, b]. Define and A as 


Zi =Xo + 


e(x2-Xi) 


A = 


-X 2 I 

1^1 -^ 2 ! 

6+ \X^ — X 2 I 


Then Zi^[a — e,b e]. This is true because (x^ — x^)/ Ix^ — X 2 I is either 
equal to 1 or to — 1. Since X 2 ^ [a, b], then 

e(x2-Xi) 

a — € <X2 — € <X2-\ ^ j— <-X2+ €<b-\- €. 


|Xi -X2 


Furthermore, it can be verified that 


X 2 = Azi + (1 — A)xi. 


We then hâve 


/(X2) <À/(zi) + (l-A)/(xi) =A[/(2 :i) -/(Xi)] +/(Xi). 


Thus, 


/(X2) -/(Xi) < A[/(zi) -/(Xi)] 

< \[M' -m'] 


< — —(M' —m') =K\x^ —X 


(3.22) 



82 


LIMITS AND CONTINUITY OF FUNCTIONS 


where K = (M' —m')/€. Since inequality (3.22) is true for any X 2 ^ {a, h\ 
we must also hâve 


f{Xx) -f{X2) <K\x^-X2\. 

From inequalities (3.22) and (3.23) we conclude that 

|/(Xi) -f{X2)\<K\x^-X2\ 

for any X 2 ^ [a, b], which shows that f(x) is hip(K, 1) on [a, b]. 
Using Theorem 3.6.1 it is easy to prove the following corollary: 


(3.23) 


□ 


Corollary 3.6.1. Let /: Z> ^ be a convex function, where D is an open 
interval. If [a, b] is any closed interval contained in Z), then f(x) is uniformly 
continuons on [a, b] and is therefore continuons on D. 


Note that if f(x) is convex on (a, b\ then it does not hâve to be 
continuons at the end points of the interval. It is easy to see, for example, 
that the function /: [ — 1, 1] ^ defined as 


/(^) 


— 1 <x < 1, 

2, X = 1,— 1 


is convex on (—1, 1), but is discontinuons at x = — 1, 1. 


3.7. CONTINUOUS AND CONVEX FUNCTIONS IN STATISTICS 

The most vivid examples of continuons functions in statistics are perhaps the 
cumulative distribution functions of continuons random variables. If X is a 
continuons random variable, then its cumulative distribution function 

F{x) =P{X<x) 

is continuons on R. In this case. 


P{X=a) 




( 1 


i M 


lim 

F 

, Ü -\ 

l n ] 

-F 

\a 

l n ] 



that is, the distribution of X assigns a zéro probability to any single value. 
This is a basic characteristic of continuons random variables. 

Examples of continuons distributions include the beta, Cauchy, chi- 
squared, exponential, gamma, Laplace, logistic, lognormal, normal, t, uni- 
form, and the Weibull distributions. Most of these distributions are described 



CONTINUOUS AND CONVEX FUNCTIONS IN STATISTICS 


83 


in introductory statistics books. A detailed account of their properties and 
uses is given in the two books by Johnson and Kotz (1970a, 1970b). 

It is interesting to note that if X is any random variable (not necessarily 
continuons), then its cumulative distribution function, F(x), is right-continu- 
ous on R (see, for example, Harris, 1966, page 55). This function is also 
monotone increasing on R. If F(x) is strictly monotone increasing, then by 
Theorem 3.5.1 it has a unique inverse F“^(y). In this case, if Y has the 
uniform distribution over the open interval (0, 1), then the random variable 
F~^(Y) has the cumulative distribution function F(x). To show this, consider 
X = F“HT). Then, 


P{X<x) =p[F-^(y) <x] 

= P[y<F(x)] 

= F{x). 

This resuit has an interesting application in sampling. If y^,y 2 ,...,l^ 
form an independent random sample from the uniform distribution f/(0, 1), 
then F~^{Y^), F~^{Y 2 ), . . . , F~^{Y^) will form an independent sample from a 
distribution with the cumulative distribution function P(x). In other words, 
samples from any distribution can be generated through sampling from the 
uniform distribution Z7(0, 1). This resuit forms the cornerstone of Monte 
Carlo simulation in statistics. Such a method provides an artificial way of 
collecting “data.” There are situations where the actual taking of a physical 
sample is either impossible or too expensive. In such situations, useful 
information can often be derived from simulated sampling. Monte Carlo 
simulation is also used in the study of the relative performance of test 
statistics and parameter estimators when the data corne from certain speci- 
fied parent distributions. 

Another example of the use of continuous functions in statistics is in limit 
theory. For example, it is known that if is a sequence of random 

variables that converges in probability to c, and if g(x) is a continuous 
function at x = c, then the random variable g{X^ converges in probability to 
g{c) as ^ 00 . By définition, a sequence of random variables 
converges in probability to a constant c if for a given 6 > 0, 

lim P{\X^ — c| > c) = 0. 

n^oo 

In particular, if is a sequence of estimators of a parameter c, then 

X^ is said to be a consistent estimator of c if X^ converges in probability to 
c. For example, the sample mean 




84 


LIMITS AND CONTINUITY OF FUNCTIONS 


of a sample of size n from a population with a finite mean /x is a consistent 
estimator of /x according to the law of large numbers (see, for example, 
Lindgren, 1976, page 155). Other types of convergence in statistics can be 
found in standard mathematical statistics books (see the annotated bibliogra- 
phy). 

Convex functions also play an important rôle in statistics, as can be seen 
from the following examples. 

If f{x) is a convex function and X is a random variable with a finite mean 
fjb = E{X), then 


E[f{X)] >f[E{X)]. 


Equality holds if and only if X is constant with probability 1. This inequality 
is known as JenserCs inequality. If / is strictly convex, the inequality is strict 
unless X is constant with probability 1. A proof of Jensen’s inequality is given 
in Section 6.7.4. See also Lehmann (1983, page 50). 

Jensen’s inequality has useful applications in statistics. For example, it can 
be used to show that if X 2 ,...,x„ are n positive scalars, then their 
arithmetic mean is greater than or equal to their géométrie mean, which is 
equal to This can be shown as follows: 

Consider the convex function /(x) = — log x. Let X be a discrète random 
variable that takes the values Xi,X 2 ,...,x„ with probabilities equal to 1/n, 
that is. 


P(X = x) 



X =Xp X2 



otherwise . 


Then, by Jensen’s inequality. 


F(-logX) > -logF(X). 

However, 

E{-\ogX) = - - Elogx,, 

^ i = l 


and 


-logF(X) = -log| 



(3.24) 


(3.25) 


(3.26) 



CONTINUOUS AND CONVEX FUNCTIONS IN STATISTICS 


85 


By using (3.25) and (3.26) in (3.24) we get 


1 « 

- E log ^ log 

n 



î 




' n ^ 

1/n 


/In \ 

log 




<log 

-Ex,. 



\i = l ) 





Since the logarithmic function is monotone increasing, we conclude that 


( n \ 

Yl^i 

\i = l 


1 /n 


1 « 




n /=i 


Jensen’s inequality can also be used to show that the arithmetic mean is 
greater than or equal to the harmonie mean. This assertion can be shown as 
follows: 

Consider the function /(x)=x“\ which is convex for x > 0. If X is a 
random variable with P(X > 0) = 1, then by Jensen’s inequality, 



1 \ 



(3.27) 


In particular, if X has the discrète distribution described earlier, then 


and 



1 


E(X) = 


1 

n 


n 


i=l 


By substitution in (3.27) we get 


1 n 1 
« ;=i 



-1 


î 


or 



(3.28) 


The quantity on the right of inequality (3.28) is the harmonie mean of 



86 


LIMITS AND CONTINUITY OF FUNCTIONS 


Another example of the use of convex functions in statistics is in the 
general theory of estimation. Let X 2 , . . . , be a random sample of size 
n from a population whose distribution dépends on an unknown parameter 
0. Let co(Xi, X 2 , , , , , X^) be an estimator of 0. By définition, the loss 
function L[9, X 2 , . . . , A„)] is a nonnegative function that measures the 

loss incurred when 0 is estimated by coiX^, X 2 , . . . , X^). The expected value 
(mean) of the loss function is called the risk function, denoted by R(0, (o), 
that is. 


R(0,co)=E{Lle,w(X„X2,...,Xj]}. 


The loss function is taken to be a convex function of 0. Examples of loss 
functions include the squared error loss. 


L[0,co(X„X 2,...,XJ] = l0-co(X„X2,...,X„)] 


2 


and the absolute error loss. 


L[ X 2 , . . . , X„)] — I ^ X 2 , . . . , X„) 


The first loss function is strictly convex, whereas the second is convex, but not 
strictly convex. 

The goodness of an estimator of 9 is judged on the basis of its risk 
function, assuming that a certain loss function has been selected. The smaller 
the risk, the more désirable the estimator. An estimator o) *(X„ X 2 , . . . , x„) 
is said to be admissible if there is no other estimator X 2 , . . . , of 9 

such that 


R(9, cü)<R(9, co^) 

for ail 9^ fl (11 is the parameter space), with strict inequality for at least 
one 9. An estimator cogiX^, X 2 , . . . , X„) is said to be a minimax estimator if it 
minimizes the supremum (with respect to of the risk function, that is, 

supi^(^, Wq) ^ supR(9, co), 


where (o(X^, X 2 , . . . , X„) is any other estimator of 9. It should be noted that 
a minimax estimator may not be admissible. 

Example 3.7.1. Let X^, X 2 , . . . , X 20 be a random sample of size 20 from 
the normal distribution N(9,l), — < 9< Let o)^(X^, X 2 , . . . , X 20 ) =X 2 q 

be the sample mean, and let cü 2 iXi, X 2 , . . . , X 20 ) = 0. Then, using a squared 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


87 


error loss function, 




Var(X2o) = è, 


R{e, C 02 ) =£[(0-0)^] = e^. 


In this case, 


whereas 


sup Wi)] = 

6 


sup ^ 2 )] = 

0 


Thus =^20 ^ better estimator than a >2 = 0. It can be shown that X 20 is 

the minimax estimator of 0 with respect to a squared error loss function. 
Note, however, that X 20 is not an admissible estimator, since 


0, 0)^) < R( 0, (O 2 ) 

for ^>20“^/^ or —20“^/^. However, for —20“^/^ < 0<2O”^/^, 


R{0, (O 2 ) <R{0, o)^). 


FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

Corwin, L. J., and R. H. Szczarba (1982). Multivariable Calculus. Dekker, New York. 

Fisz, M. (1963). Probability Theory and Mathematical Statistics, 3rd ed. Wiley, New 
York. (Some continuons distributions are described in Chap. 5. Limit theorems 
concerning sequences of random variables are discussed in Chap. 6.) 

Fulks, W. (1978). Advanced Calculus, 3rd ed. Wiley, New York. 

Hardy, G. H. (1955). A Course of Pure Mathematics, lOth ed. The University Press, 
Cambridge, England. (Section 89 of Chap. 4 gives définitions concerning the o 
and O symbols introduced in Section 3.3 of this chapter.) 

Harris, B. (1966). Theory of Probability. Addison-Wesley, Reading, Massachusetts. 
(Some continuons distributions are given in Section 3.5.) 

Henle, J. M., and E. M. Kleinberg (1979). Infinitésimal Calculus. The MIT Press, 
Cambridge, Massachusetts. 

Hillier, F. S., and G. J. Lieberman (1967). Introduction to Operations Research. 
Holden-Day, San Francisco. (Convex sets and functions are described in Ap- 
pendix 1.) 

Hogg, R. V., and A. T. Craig (1965). Introduction to Mathematical Statistics, 2nd ed. 
Macmillan, New York. (Loss and risk functions are discussed in Section 9.3.) 

Hyslop, J. M. (1954). Infinité Sériés, 5th ed. Oliver and Boyd, Edinburgh, England. 
(Chap. 1 gives définitions and summaries of results concerning the o, O notation.) 



88 


LIMITS AND CONTINUITY OF FUNCTIONS 


Johnson, N. L., and S. Kotz (1970a). Continuons Univariate Distributions — 1. Houghton 
Mifflin, Boston. 

Johnson, N. L., and S. Kotz (1970b). Continuons Univariate Distributions — 2. Houghton 
Mifflin, Boston. 

Lehmann, E. L. (1983). Theory of Point Estimation. Wiley, New York. (Section 1.6 
discusses convex functions and their uses as loss functions.) 

Lindgren, B. W. (1976). Statistical Theory, 3rd ed. Macmillan, New York. (The 
concepts of loss and utility, or négative loss, used in statistical decision theory are 
discussed in Chap. 8.) 

Randles, R. H., and D. A. Wolfe (1979). Introduction to the Theory of Nonparametric 
Statistics. Wiley, New York. (Some mathematical statistics results, including 
Jensen’s inequality, are given in the Appendix.) 

Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd ed. Wiley, New 
York. 

Roberts, A. W., and D. E. Varberg (1973). Convex Functions. Academie Press, New 
York. (This is a handy référencé book that contains ail the central facts about 
convex functions.) 

Roussas, G. G. (1973). A First Course in Mathematical Statistics. Addison-Wesley, 
Reading, Massachusetts. (Chap. 12 defines and provides discussions concerning 
admissible and minimax estimators.) 

Rudin, W. (1964). Principles of Mathematical Analysis, 2nd ed. McGraw-Hill, New 
York. (Limits of functions and some properties of continuons functions are given 
in Chap. 4.) 

Sagan, H. (1974). Advanced Calculas. Houghton Mifflin, Boston. 

Smith, D. E. (1958). History of Mathematics, Vol. 1. Dover, New York. 


EXERCISES 
In Mathematics 

3.1. Déterminé if the following limits exist: 


- 1 


(a) 

lim — 

x^l X 

-1 ’ 

(b) 

1 

lim x: sin — , 



(c) 

l 

lim sin — 

/sin 

x^O \ X j 

/ 1 



EXERCISES 


89 



lim /( X ) , where f(x) 

x^O 


^ 7 ^ ’ 

— 1 

2{x-l) ’ 

V 


X > 0, 

X < 0. 


3.2. Show that 

(a) tan x^ = o(x^) as x ^ 0. 

(b) X = o(]/x ) as X ^ 0, 

(c) 0(1) = o(x) as X ^ 

(d) /(x)g(x) =x”^ + 0(1) as x^O, where f(x) =x o(x^), g(x) = 
x-2 + O(x“0. 

3.3. Détermine where the following functions are continuons, and indicate 
the points of discontinuity (if any): 


(a) 

f(x) = \ 

fxsin(l/x), Xt^O, 
(0, x = 0, 


(b) 

f(x) = { 

f l(x- i)/(2-x)]^''\ 

X # 2, 

il, 

X = 2, 

(c) 

f(x) = { 

fx”™/", x#0, 

(0, x = 0, 



where m and n are positive integers, 

x^- 2x^ + 3 

(d) f(x) = — — , x#l. 


3.4. Show that /(x) is continuons at x = a if and only if it is both left- 
continuous and right-continuous at x = a, 

3.5. Use Définition 3.4.1 to show that the function 


/(x) =x^-l 


is continuons at any point a ^ R. 


3.6. For what values of x is 


/(x) = lim 


3wc 


n^oo 


continuons? 


1 —wc 



90 


LIMITS AND CONTINUITY OF FUNCTIONS 


3.7. Consider the function 

X — |x| 

f(x) = , — l<x<l, x^O. 

X 

Can /(x) be defined at x = 0 so that it will be continuous there? 

3.8. Let /(x) be defined for ail x^R and continuous at x = 0. Further- 
more, 

f{a+b) =f{a) +f{b), 

for ail a, b in R. Show that /(x) is uniformly continuous everywhere 
in R. 


3.9. Let /(x) be defined as 


2x — 1, 


0 <x < 1, 


P / \ V/ i ✓V i J. ^ 

\x^ — 5x^ + 5, l<x<2. 

Détermine if /(x) is uniformly continuous on [0,2]. 

3.10. Show that /(x) = cos x is uniformly continuous on R. 


3.11. Prove Theorem 3.4.1. 


3.12. A function /: D ^R is called upper semicontinuous at a e D if for a 
given €> 0 there exists a ô > 0 such that 

/(x) </(«)+ 6 

for ail X ^Ng(a) PiZ). If the above inequality is replaced with 

/(x) >f{a)-€, 

then /(x) is said to be lower semicontinuous. 

Show that if D is closed and bounded, then 

(a) /(x) is bounded from above on D if /(x) is upper semicontinuous. 

(b) /(x) is bounded from below on D if /(x) is lower semicontinuous. 

3.13. Let /: [a,b]^R be continuous such that /(x) = 0 for every rational 
number in [a, b]. Show that /(x) = 0 for every x in [a, b]. 

3.14. For what values of x does the function 

/(x) = 3 + |x — 1| + |x + 1| 


hâve a unique inverse? 



EXERCISES 


91 


3.15. Let /: R defined as 

Find the inverse function 

3.16. Let f(x) = 2x^ — 8x + 8. Find the inverse of f(x) for 

(a) X < — 2, 

(b) X > 2. 

3.17. Suppose that /: [a, b]^R is ^ convex function. Show that for a given 
e > 0 there exists a ô > 0 such that 

i: !/(«,) -f(b,)\<€ 

i = l 

for every finite, pairwise disjoint family of open subintervals {(a,, 
of [a, b] for which — a^) < 8. 

Note: A function satisfying this property is said to be absolutely contin- 
uons on [a, b]. 

3.18. Let /: [a, b]^R hc ^ convex function. Show that if a^, ü 2 ,. are 
positive numbers and x^, X 2 , . . . , x„ are points in [a, b], then 


f 


À 


where A = 

3.19. Let /(x) be continuons on D <zR. Let S be the set of ail x^D such 
that /(x) = 0. Show that 5 is a closed set. 

3.20. Let /(x) be a convex function on D <zR. Show that exp[/(x)] is also 
convex on D. 


In Statistics 

3.21. Let X be a continuons random variable with the cumulative distribu- 
tion function 


F{x) = x>0. 

This is known as the exponential distribution. Its mean and variance 
are fjL=0,a^ = 0^, respectively. Generate a random sample of five 
observations from an exponential distribution with mean 2. 



92 


LIMITS AND CONTINUITY OF FUNCTIONS 


[Hint: Select a ten-digit number from the table of random numbers, for 
example, 8389611097. Divide it by 10^^ to obtain the décimal number 
0.8389611097. This number can be regarded as an observation from the 
uniform distribution f/(0, 1). Now, solve the équation F{x) = 
0.8389611097. The resulting value of x is considered as an observation 
from the prescribed exponential distribution. Repeat this process four 
more times, each time selecting a new décimal number from the table 
of random numbers.] 

3.22. Verify Jensen’s inequality in each of the following two cases: 

(a) X is normally distributed and /(x) = |x| . 

(b) X has the exponential distribution and f(x) = e~^. 

3.23. Use the définition of convergence in probability to verify that if the 

sequence of random variables converges in probability to zéro, 

then so does the sequence 

3.24. Show that 


E{X^)>[E{\X\)]\ 

[Hint: Let Y=\X\. Apply Jensen’s inequality to Y with /(x)=x^.j 
Deduce that if X has a mean /x and a variance then 

E{\X- fi\) <a. 


3.25. Consider the exponential distribution described in Exercise 3.21. Let 
X 2 ,...,X„ be a sample of size n from this distribution. Consider 
the following estimators of 6: 

(a) ù)^(X^, X2, . . . , X„) =X„, the sample mean. 

(b) ct>2(X^, X 2 , . . . , X„) =X„ + 1, 

(c) ^ 3 (Xi,X 2 ,...,X„)=X„. 

Détermine the risk function corresponding to a squared error loss 
function for each one of these estimators. Which estimator has the 
smallest risk for ail values of ^ ? 



CHAPTER 4 


Différentiation 


Différentiation originated in connection with the problems of drawing tan- 
gents to curves and of finding maxima and minima of functions. Pierre 
de Fermât (1601-1665), the founder of the modem theory of numbers, is 
credited with having put forth the main ideas on which differential calculas 
is based. 

In this chapter, we shall introduce the notion of différentiation and study 
its applications in the détermination of maxima and minima of functions. We 
shall restrict our attention to real-valued functions defined on R, the set of 
real numbers. The study of différentiation in connection with multivariable 
functions, that is, functions defined on {n> 1), will be considered in 
Chapter 7. 


4.1. THE DERIVATIVE OF A FUNCTION 

The notion of différentiation was motivated by the need to find the tangent 
to a curve at a given point. Fermâtes approach to this problem was inspired by 
a géométrie reasoning. His method uses the idea of a tangent as the limiting 
position of a sécant when two of its points of intersection with the curve tend 
to coincide. This has lead to the modem notation associated with the 
dérivative of a function, which we now introduce. 

Définition 4.1.1, Let f(x) be a function defined in a neighborhood Nj.(xq) 
of a point Xq. Consider the ratio 


Hh) = 


f(xo + h) -f(xo) 


h 


? 


(4.1) 


where /z is a nonzero incrément of Xq such that —r<h<r. If c^(/z) has a 
limit as /z ^ 0, then this limit is called the dérivative of f{x) at Xq and is 


93 



94 


DIFFERENTIATION 


denoted by /'(xq). It is also common to use the notation 


df{x) 

dx 


X=Xq 


= f'{Xo)- 


We thus hâve 


/'(Xo) = lim 
h^O 


f{xo + h) -/(xq) 
h 


By putting x=Xq-\- h, formula (4.2) can be written as 


/'(Xo) = lim 


/(x) -/(Xo) 


X^X, 


x—x 


0 



If /'(xq) exists, then f(x) is said to be différentiable at x=Xq. Geometri- 
cally, /'(xq) is the slope of the tangent to the graph of the function y =f(x) 
at the point (xo,yoX where yg /(^) has a dérivative at every point 

of a set D, then f(x) is said to be différentiable on D. 

It is important to note that in order for /'(xq) to exist, the left-sided and 
right-sided limits of 4*(h) in formula (4.1) must exist and be equal as h ^ 0, 
or as X approaches Xq from either side. It is possible to consider only 
one-sided dérivatives at x =Xq. These occur when has just a one-sided 
limit as h ^ 0. We shall not, however, concern ourselves with such dériva- 
tives in this chapter. □ 


Functions that are différentiable at a point must necessarily be continuous 
there. This will be shown in the next theorem. 


Theorem 4.1.1. Let f(x) be defined in a neighborhood of a point Xg. If 
/(x) has a dérivative at Xg, then it must be continuous at Xg. 

Proof From Définition 4.I.I we can write 

/(xo + /î) -f(xo) =h(t>{h). 

If the dérivative of /(x) exists at Xg, then ^/'(xg) as /z ^ 0. It follows 
from Theorem 3.2. 1(2) that 

/(xo + /î) -/(xo) ^0 

as /z ^ 0. Thus for a given e > 0 there exists a ô > 0 such that 

|/(Xo + /î) -/(Xo)|<e 

if \h\ 3. This indicates that y(x) is continuous at Xg. Fl 



THE DERIVATIVE OF A FUNCTION 


95 


It should be noted that even though continuity is a necessary condition for 
differentiability, it is not a sufficient condition, as can be seen from the 
following example: Let f(x) be defined as 

x¥=0, 

X = 0. 

This fonction is continuons at x = 0, since /(O) = lim^ ^ o = 0 by the fact 
that 



1 

X sin — 
X 


< kl 


for ail X. However, /(x) is not différentiable at x = 0. This is because when 
X = 0, 


Hh) = 




h 

1 

/z sin 0 

h 

h 


? 


1 



since /z 0, 


which does not hâve a limit as /z ^ 0. Hence, /'(O) does not exist. 

If /(x) is différentiable on a set Z>, then /'(x) is a fonction defined on D. 
In the event /'(x) itself is différentiable on Z>, then its dérivative is called the 
second dérivative of /(x) and is denoted by /"(x). It is also common to use 
the notation 


d^fjx) 

dx^ 




By the same token, we can define the nth (n > 2) dérivative of /(x) as the 
dérivative of the (n — l)st dérivative of f(x). We dénoté this dérivative by 


dj{x) 

dx" 


=/(">(x), n = 2,3,.... 


We shall now discuss some rules that pertain to différentiation. The 
reader is expected to know how to differentiate certain elementary fonctions 
such as polynomial, exponential, and trigonométrie fonctions. 



96 


DIFFERENTIATION 


Theorem 4.1.2. Let f(x) and g(x) be defined and différentiable on a 
set D. Then 

1 . [af(x) + (Sgix)]' = af'(x) + (5g'(x), where a and [5 are constants. 

2. [f(x)g(x)Y =f'(x)g(x) -\-f(x)g’(x). 

3. [f(x)/g(x)V = [f(x)g(x) -f(x)gXx)]/gKx) if g(x) ^ 0. 

Proof. The proof of (1) is straightforward. To prove (2) we write 

f{x + h)g{x + h) -f{x)g{x) 

lim 

h 


lim 


\f{x + h) -f{x)\g{x + h) +f{x)[g{x + h) -g(x)] 


lim g(x-\-h) lim 


h 

f{x + h) -f{x) 
h 


+/(x) lim 


g{x + h) -g{x) 
h 


However, g(x h) = g(x), since g(x) is continuons (because it is 

différentiable). Hence, 


lim 

h^O 


f{x + h)g{x + h) -f{x)g{x) 

h 


= g{x)f'{x) +f{x)g'{x). 


Now, to prove (3) we write 


lim 


f{x + h)/g{x + h) -f{x)/g{x) 

h 


lim 


lim 

/z^O 


g{x)f{x + h) -f{x)g{x + h) 
hg{x)g{x + h) 

g{x)[f{x + h) -f{x)] -f{x)[g{x + h) -g(x)] 

hg{x)g{x + h) 


\iraf^^^[g{x)[f{x + h) -f{x)]/h -f{x)[g{x + h) -g{x)]/h] 


g{x)\imf^^^g{x + h) 



THE DERIVATIVE OF A FUNCTION 


97 


Theorem 4.1.3 (The Chain Rule). Let /: and g: D 2 ^ R be two 

functions. Suppose that f(D^) cZ) 2 - If /(^) is différentiable on and g(x) 
is différentiable on Z> 2 ? I^en the composite function h(x) =g[f(x)] is différ- 
entiable on and 


dg[fjx)] dg[f{x)] df{x) 

dx df(x) dx 

Proof. Let z =f(x) and t =f(x + h). By the fact that g(z) is différentiable 
we can write 


g[f(x + h)] -g[f(x)] =g{t) -g{z) 

= {t-z)g'iz) +o{t-z), (4.3) 

where, if we recall, the o-notation was introduced in Section 3.3. We then 
hâve 


g[f{x + h)] -g[f{x)] t-z o{t-z) t-z 

g (z) + 


h 


h 


t —Z 


h 


(4.4) 


As /z ^ 0, t ^ Z, and hence 


t-z f{x + h)-f{x) df{x 

lim = hm = 

h^O h h^Q h dx 


Now, by taking the limits of both sides of (4.4) as /z ^ 0 and noting that 


o(t—z) o(t—z) 

lim = lim = 0, 

t—z t^z t—z 


we conclude that 


dg[f(x)] df{x) dg[f{x)] 

= . □ 

dx dx df(x) 

Note 4.1.1. We recall that f(x) must be continuons in order for f'(x) to 
exist. However, if f'ix) exists, it does not hâve to be continuons. Care should 
be exercised when showing that fXx) is continuons. For example, let us 
consider the function 


/(^) 


1 

x^ sin — , 
X 

0 , 


X # 0, 


X = 0. 



98 


DIFFERENTIATION 


Suppose that it is desired to show that f'(x) exists, and if so, to déterminé if 
it is continuons. To do so, let us first find out if f'(x) exists at x = 0: 


/'(O) 


lim 

h^O 


lim 

h^O 


m -/(O) 

h 

1 

sin — 
^ 

h 


1 

= lim /z sin — = 0 . 
h 


Thus the dérivative of f{x) exists at x = 0 and is equal to zéro. For x # 0, it is 
clear that the dérivative of /(x) exists. By applying Theorem 4.1.2 and using 
our knowledge of the dérivatives of elementary functions, f{x) can be 
written as 


r,/ X 2xsin cos — , x=5^0, 

f{x) = { X X 

1^0, x = 0. 

We note that /'(x) exists for ail x, but is not continuons at x = 0, since 

/ 1 1 \ 

lim/'(x) = lim 2 X sin cos — 

x^O x^O \ X XJ 

does not exist, because cos(l/x) has no limit as x ^ 0. However, for any 
nonzero value of x, fXx) is continuons. 

If /(x) is a convex function, then we hâve the following interesting resuit, 
whose proof can be found in Roberts and Varberg (1973, Theorem C, 
page 7): 

Theorem 4.1.4. If /: (a, b) ^ R is convex on the open interval (a, b), then 
the set S where /'(x) fails to exist is either finite or countable. Moreover, 
fXx) is continuons on (a, b) — S, the complément of S with respect to (a, b). 

For example, the function /(x) = |x| is convex on R. lis dérivatives does 
not exist at x = 0 (why?), but is continuons everywhere else. 

The sign of f'(x) provides information about the behavior of /(x) in a 
neighborhood of x. More specifically, we hâve the following theorem: 

Theorem 4.1.5. Let /: D ^ R, where D is an open set. Suppose that 
fXx) is positive at a point Xq ^D. Then there is a neighborhood A^g(xo) cZ) 
such that for each x in this neighborhood, /(x)>/(xq) if x>Xq, and 
/(x) </(xq) if X <Xq. 



THE MEAN VALUE THEOREM 


99 


Proof Let e=f'{x^)/2. Then, there exists a ô> 0 such that 


/ (^o) - e< — — </ (^o) + e 


x—x 


0 


if |x — Xn| < ô. Hence, if x >Xn, 


f{x) -/(Xo) > 


(x-Xo)/'(Xo) 


which shows that fix) >/(xq) since /'(xq) > 0. Furthermore, since 


f{x) -/(Xo) 


X —X 


> 0 , 


0 


then f(x) </(xq) if x <Xq. □ 

If /'(xq) < 0, it can be similarly shown that /(x) </(xq) if x>Xq, and 
/(x) >/(xq) if X <Xq. 


4.2. THE MEAN VALUE THEOREM 

This is one of the most important theorems in differential calculus. It is also 
known as the theorem of the mean. Before proving the mean value theorem, 
let us prove a spécial case of it known as Rolle's theorem. 

Theorem 4.2.1 (Rolle’s Theorem). Let /(x) be continuons on the closed 
interval [a, h] and différentiable on the open interval {a, h). If f(a) =f(b), 
then there exists a point c, a <c <b, such that f'(c) = 0. 

Proof. Let d dénoté the common value of f(a) and f(b). Define h(x) = 
fix) — d. Then h(a) = h(b) = 0. If h(x) is also zéro on (a, b), then h'(x) = 0 
for a<x<b and the theorem is proved. Let us therefore assume that 
h{x) 0 for some x e {a, b). Since h(x) is continuons on [a, b] [because /(x) 
is], then by Corollary 3.4.1 it must achieve its supremum M at a point ^ in 
[a, b], and its infimum m at a point r] in [a, b]. If /ï(x) > 0 for some 
X e (a, b), then we must obviously hâve a < ^<b, because h(x) vanishes at 
both end points. We now daim that /z'(^) = 0. If /z'(^) > 0 or <0, then by 
Theorem 4.1.5, there exists a point x^ in a neighborhood <z(a,b) at 

which h(xf) > h(^X a contradiction, since h(^) =M. Thus h'(^) = 0, which 
implies that f'(i) = 0, since h'(x) =f'(x) for ail x e (a, b). We can similarly 
arrive at the conclusion that f'(r]) = 0 if h(x) < 0 for some x e (a, b). In this 
case, if hfr]) ^ 0, then by Theorem 4.1.5 there exists a point X 2 in a neigh- 



100 


DIFFERENTIATION 


borhood Ng^iq) c (a, b) at which h(x 2 ) < h(r]) =m, sl contradiction, since m 
is the infimum of h{x) over [a, h\ 

Thus in both cases, whether h{x) > 0 or < 0 for some x e (a, h\ we must 
hâve a point c, a <c <b, such that fXc) = 0. □ 

Rolle’s theorem has the following géométrie interprétation: If f(x) satis- 
fies the conditions of Theorem 4.2.1, then the graph of y =f(x) must hâve a 
tangent line that is parallel to the x-axis at some point c between a and b. 
Note that there can be several points like c inside {a, b). For example, the 
function y =x^ — 5x^ + 3x — 1 satisfies the conditions of Rolle’s theorem on 
the interval [a,b\ where a=Q and è = (5 + /ï^)/2. In this case, f(a) = 
f(b) = — 1, and f'(x) = 3x^ — lOx + 3 vanishes at x = | and x = 3. 

Theorem 4.2.2 (The Mean Value Theorem). If /(x) is continuous on the 
closed interval [a, b] and différentiable on the open interval (a, b), then there 
exists a point c, a <c <b, such that 

f{b) =f{a) + {b-a)f'{c). 

Proof. Consider the function 

^(x) =f(x) -f{a) -A{x-a), 

where 



f{b) -f{a) 
b — a 


The function T>(x) is continuous on [a, b} and is différentiable on (a, b), 
since ^’{x) =f\x) —A. Furthermore, = T>(è) = 0. It follows from 
Rolle’s theorem that there exists a point c, a <c <b, such that $'(c) = 0. 
Thus, 


f’(c) 


m -fia) 

b — a 


which proves the theorem. □ 

The mean value theorem has also a nice géométrie interprétation. If the 
graph of the function y =/(x) has a tangent line at each point of its length 
between two points and P 2 (see Figure 4.1), then there must be a point Q 
between P^ and P 2 at which the tangent line is parallel to the sécant line 
through P^ and P 2 ^ Note that there can be several points on the curve 
between P^ and P 2 that hâve the same property as Q, as can be seen from 
Figure 4.1. 

The mean value theorem is useful in the dérivation of several interesting 
results, as will be seen in the remainder of this chapter. 



THE MEAN VALUE THEOREM 


101 



Figure 4.1. Tangent Unes parallel to the sé- 
cant line. 


Corollary 4.2.1. If f(x) has a dérivative fXx) that is nonnegative (non- 
positive) on an interval {a, b), then f(x) is monotone increasing (decreasing) 
on {a, b). If f'(x) is positive (négative) on (a, b), then f(x) is strictly 
monotone increasing (decreasing) there. 

Proof Let x-^ and X 2 be two points in (a, è) such that <X2* By the 
mean value theorem, there exists a point Xq, x^ <Xq <X 2 , such that 

/(X2) =/(Xi) + (X2-Xi)/'(Xo). 

If /'(xq) > 0, then /(X 2 ) >f(xi) and f(x) is monotone increasing. Similarly, 
if /'(xq)< 0, then f(x 2 ) <f(xi) and f(x) is monotone decreasing. If, how- 
ever, f'(x) > 0, or fXx) < 0 on (a, b), then strict monotonicity follows over 
(a, b), □ 

Theorem 4.2.3. If f(x) is monotone increasing [decreasing] on an interval 
(a, b\ and if f(x) is différentiable there, then fXx) > 0 [f'(x) < 0] on (a, b). 

Proof. Let Xq e (a, b). There exists a neighborhood Nj,(xq) c (a, b). Then, 
for any x e A(.(xo) such that x ^Xq, the ratio 

f{x) -/(Xq) 

X — Xq 


is nonnegative. This is true because /(x) >/(xq) if x >Xq and /(x) </(xq) if 
X <Xq. By taking the limit of this ratio as x ^Xq, we daim that /'( xq ) > 0. To 
prove this daim, suppose that /'(xq) < 0. Then there exists a ô > 0 such that 


/(x) -/(Xo) 


X — X 




0 





102 


DIFFERENTIATION 



< 8. It follows that 


f{x) -f(Xo) 


< 2/'(^o) <0. 


Thus /(x) </(xq) if x>Xq, which is a contradiction. Hence, f'(xQ)>0. 
A similar argument can be used to show that /'(xq) < 0 when f(x) is 
monotone decreasing. □ 


Note that strict monotonicity on a set D does not necessarily imply that 
fXx) > 0, or f'(x) < 0, for ail x in D. For example, the function /(x) =x^ is 
strictly monotone increasing for ail x, but f'(0) = 0. 

We recall from Theorem 3.5.1 that strict monotonicity of /(x) is a 
sufficient condition for the existence of the inverse function The next 
theorem shows that under certain conditions, f~^ is a différentiable function. 


Theorem 4.2.4. Suppose that /(x) is strictly monotone increasing (or 
decreasing) and continuons on an interval [a, b]. If /'(x) exists and is 
different from zéro at Xq ^ (a, b\ then the inverse function is différen- 
tiable at yo ils dérivative is given by 


df-\y) 



!>’=>’ 0 


1 

fix,) * 


Proof By Theorem 3.5.2, / Hy) exists and is continuons. Let Xg ^ (a, b), 
and let N^Xq) c {a, b) for some r > 0. Then, for any x e N^Xq), 


f~\y) -f~\yo) x-xq 
y-yo f{x) -f{xo) 

1 

;/(x) -/(Xo)]/(x-Xo) ’ 


(4.5) 


where y =/(x). Now, since both / and f~^ are continuons, then x^Xg if 
and only if y^yg. By taking the limits of ail sides in formula (4.5), we 
conclude that the dérivative of f~^ at y g exists and is equal to 


df-\y) 



iy=yo 


1 

7^‘ 


□ 


The following theorem gives a more general version of the mean value 
theorem. It is due to Augustin-Louis Cauchy and has an important applica- 
tion in calculating certain limits, as will be seen later in this chapter. 



THE MEAN VALUE THEOREM 


103 


Theorem 4.2.5 (Cauchy’s Mean Value Theorem). If f(x) and g(x) are 
continuous on the closed interval [a, b] and différentiable on the open 
interval (a, b), then there exists a point c, a <c <b, such that 

[f(b) -f{a)]g'(c) = [g(fe) -g{a)]f'{c), 

Proof The proof is based on using Rolle’s theorem in a manner similar to 
that of Theorem 4.2.2. Define the function i//(x) as 

Ifj{x) = \f{b) -f{x)][g{b) -g(a)] - [f{b) ~f{a)][g{b) -g(x)]. 

This function is continuous on {a, b] and is différentiable on {a, b), since 

= -f'{x)[g{b) -g(fl)] +g’{x)[f{b) -f{a)\. 

Furthermore, ij/ia) = il/(b) = 0. Thus by Rolle’s theorem, there exists a point 
c, a <c <b, such that ijj'ic) = 0, that is, 

-f'{c)[g{b) -g(fl)] +g’{c)[f{b) -f{a)] =0. (4.6) 

In particular, if g{b)—g(a)¥^0 and f'(x) and g'(x) do not vanish at the 
same point in (a, b), then formula (4.6) an be written as 

fie) m-fja) ^ 

g'{c) g{b)-g{a)' 


An immédiate application of this theorem is a very popular method in 
calculating the limits of certain ratios of functions. This method is attributed 
to Guillaume François Marquis de THospital (1661-1704) and is known as 
rHospitaVs mie. It deals with the limit of the ratio f(x)/g(x) as x ^ a when 
both the numerator and the denominator tend simultaneously to zéro or to 
infinity as x^a. In either case, we hâve what is called an indeterminate 
ratio caused by having 0/0 or oo/œ as x^ a. 

Theorem 4.2.6 (rHospitahs Rule). Let /(x) and g(x) be continuous on 
the closed interval {a, b] and différentiable on the open interval {a, b). 
Suppose that we hâve the following: 

1 . g(x) and g'(x) are not zéro at any point inside (a, b). 

2. lim^^^+ //x)/g/x) exists. 

3. /(x) ^ 0 and g(x) ^ 0 as x ^ or /(x) ^ oo and g(x) ^ oo as x ^ 


Then, 


lim 

x^a 


f(x) 

six) 



f'ix) 

g’{x)' 



104 


DIFFERENTIATION 


Proof For the sake of simplicity, we shall drop the + sign from and 
simply Write a when x approaches a from the right. Let us consider the 
following cases: 

Case 1. /(x) ^ 0 and g(x) ^ 0 as x ^ a, where a is finite. Let x e {a, h). 
By applying Cauchy ’s mean value theorem on the interval [a, x] we get 


f{x) ^ /(x) -f{a) ^ f'{c) 
g{x) g{x)-g{a) g'{c)' 


where a <c <x. Note that f(a) =g(a) = 0, since /(x) and g(x) are continu- 
ons and their limits are equal to zéro when x^a. Now, as x ^ a,c ^ a; 
hence 


lim 

x^a 



fie) n^) 

g'{c) g'{x) ■ 


Case 2. /(x) ^ 0 and g(x) ^ 0 as x ^ oo. Let z = 1 /x. As x ^ z ^ 0. 
Then 


lim 

X^oo 


/w 

six) 



(4.7) 


where //z)=/(l/z) and g^(z) = g(l/z). These functions are continuons 
since /(x) and g(x) are, and z 0 as z ^ 0 (see Theorem 3.4.2). Here, we 
find it necessary to set /^CO) = g/O) = 0 so that //z) and g/z) will be 
continuons at z = 0, since their limits are equal to zéro. This is équivalent to 
defining /(x) and g(x) to be zéro at infinity in the extended real number 
System. Furthermore, by the chain rule of Theorem 4.1.3 we hâve 

f[{z) =/'(x)|-^J, 
g[{z)=g'{x)^-^ . 


If we now apply Case 1, we get 



THE MEAN VALUE THEOREM 


105 


From (4.7) and (4.8) we then conclude that 


lim 


• 00 



f’jx) 

g'(x)' 


Case 3. /(x) ^ ^ and g(x) ^ co as x^a, where a is finite. Let 
\im^^ J_f'(x)/g'(x)] = L. Then for a given 6 > 0 there exists a Ô > 0 such 
that a 8 <b and 


g'{x) 


-L 


< 


(4.9) 


if a <x <a 8 . By applying Cauchy ’s mean value theorem on the interval 
[x,a + ô] we get 


f(x) -f{a + 3) _ f’{d) 
g(x)-g(a + 5) g'{d)' 

where x <d <a 8. From inequality (4.9) we then hâve 


/(x) -f{a + 8) 
g(x) -g{a + 8) 

for ail X such that a <x < a 8. It follows that 



lim 


X 


•a 


lim 


X 


•a 


f{x) -f{a + 8) 
g{x) -g{a + 8) 

f(x) l-/(fl + g)//(x) 
g(x) x^a 1 -g{a + 8)/g{x) 

f{x) 

g(x) 


since both /(x) and g(x) tend to as x^ a. 

Case 4. /(x) ^ and g(x) ^ co as x ^ oo. This can be easily shown by 
using the techniques applied in Cases 2 and 3. 


Case 5. lim^^^ /'(x)/g'(x) = where a is finite or infinité. Let us 
consider the ratio g(x)//(x). We hâve that 


lim 

x^a 


g'jx) 

f’(x) 




106 


DIFFERENTIATION 


Hence, 



g'{x) 

f’ix) 



If A is any positive number, then there exists a ô > 0 such that 

g{x) J_ 
f{x) A’ 


if a <x <a d. Thus for such values of x, 


which implies that 




f{x) 

six) 


When applying rHospitahs rule, the ratio f{x)/g'{x) may assume the 
indeterminate from 0/0 or œ/œ as x^a. In this case, higher-order dériva- 
tives of f{x) and g(x), assuming such dérivatives exist, will be needed. In 
general, if the first n — dérivatives of f{x) and g(x) tend simultane- 

ously to zéro or to as x ^ a, and if the nXh dérivatives, f^^Kx) and 
exist and satisfy the same conditions as those imposed on fXx) and g/x) in 
Theorem 4.2.6, then 


, f(x) r"Kx) 

lim — — - = lim — ttt — - . □ 

x^a g{x) x^a g^’'\x) 

A Historical Note 


According to Eves (1976, page 342), in 1696 the Marquis de THospital 
assembled the lecture notes of his teacher Johann Bernoulli (1667-1748) into 
the world’s first textbook on calculus. In this book, the so-called rHospitahs 
rule is found. It is perhaps more accurate to refer to this rule as the 
Bernoulli-rHospital rule. Note that the name THospital follows the old 
French spelling and the letter ^ is not to be pronounced. In modem French 
this name is spelled as THopital. 


sin X cos X 

Example 4.2.1. lim = lim = I. 

X jc->0 I 

This is a well-known limit. It implies that sin x and x are asymptotically 
equal, that is, sin x ^x as x ^ 0 (see Section 3.3). 

I — cos X sin X cos x I 

Example 4.2.2. lim ^ = lim = lim = — . 

x^Q X x^O 2x x^O 2 2 

We note here that rHospitaPs rule was applied twice before reaching the 
limit 



THE MEAN VALUE THEOREM 


107 


Example 4.2.3. lim — , where a> 1. 

X 

This is of the form œ/oo as Since then 

ax e^^^^\\oga) 

lim — = lim = co. 

X^oo X X^oo 1 

This is also a well-known limit. On the basis of this resuit it can be shown 
that (see Exercise 4.12) the following hold: 


1 . lim — = 00 , where a > l,m > 0. 

X^oo X"" 

logx 

2. lim = 0, where m > 0. 


Example 4.2.4. lim^^o+ x"". 

This is of the form 0^ as x ^ 0"^, which is indeterminate. It can be reduced 
to the form 0/0 or œ/œ so that rHospitaPs rule can apply. To do so we write 
x^ as 


However, 




X log X = 


logx 
1/x ’ 


which is of the form — oo/oo as x ^ 0"^. By rHospitaPs rule we then hâve 

1/x 

lim (xloex)= lim r 

^ ^ X- 0 + -l/x^ 


It follows that 





lim x^= lim 

a:^ 0 + + 


= exp| lim (xlogx) 



Example 4.2.5. lim x log 


X + 1 


\ X — 1 ^ 

This is of the form oo x 0 as x ^ oo^ which is indeterminate. But 


xlog 


x+ 1 \ 

X — 1 , 


log 


X + 1 \ 
X — 1 

1/x 



108 


DIFFERENTIATION 


is of the form 0/0 as x ^ Hence, 


-2 


lim X log 


X^oo 


X + 1 
X — 1 


lim 

X-*oo 


= lim 


(x — l)(x + 1) 
-l/x^ 

2 


X->co ( 1 — l/x)( 1 + 1/x) 



We can see from the foregoing examples that the use of rHospitahs rule 
can facilitate the process of finding the limit of the ratio f(x)/g(x) as x^ a. 
In many cases, it is easier to work with f{x)/gXx) than with the original 
ratio. Many other indeterminate forms can also be resolved by rHospitahs 
rule by first reducing them to the form 0/0 or oo/œ as was shown in 
Examples 4.2.4 and 4.2.5. 

It is important here to remember that the application of rHospitahs rule 
requires that the limit of /'(x)/g'(x) exist as a finite number or be equal to 
infinity in the extended real number System as x^ a. If this is not the case, 
then it does not necessarily follow that the limit of f(x)/g(x) does not exist. 
For example, consider /(x) =x^ sin(I/x) and g(x)=x. Here, f(x)/g(x) 
tends to zéro as x ^ 0, as was seen earlier in this chapter. However, the ratio 

fix) ^ , 1 1 

— — - = 2x sm cos — 

g\x) X X 

has no limit as x ^ 0, since it oscillâtes inside a small neighborhood of the 
origin. 


4.3. TAYLOR’S THEOREM 

This theorem is also known as the general mean value theorem, since it is 
considered as an extension of the mean value theorem. It was formulated by 
the English mathematician Brook Taylor (I685-I73I) in 1712 and has since 
become a very important theorem in calculus. Taylor used his theorem to 
expand functions into infinité sériés. However, full récognition of the impor- 
tance of Taylor’s expansion was not realized until 1755, when Leonhard 
Euler (1707-1783) applied it in his differential calculus, and still later, when 
Joseph Louis Lagrange (I736-I8I3) used it as the foundation of his theory of 
functions. 

Theorem 4.3.1 (Taylor’s Theorem). If the (n — I)st (n > I) dérivative of 
/(x), namely f^^~^^(x), is continuons on the closed interval [a, b] and the nth 
dérivative f^^\x) exists on the open interval (a, b), then for each x e [a, b] 



TAYLOR’S THEOREM 


109 


we hâve 


f{x) =f{a) + {x-a)f'{a) + 


(x — a) 

2T~ 




n — 1 


n 


(x — a) (x — a) 

+ - + ^ 


(n- 1)! 


ni 


where a < ^<x. 

Proof. The method to prove this theorem is very similar to the one used 
for Theorem 4.2.2. For a fixed x in [a, h] let the function be defined as 


4>n(0 =gniO - 


X — t ] 


X — a 


n 


8n{a), 


where a <t <b and 


^«(0 =f{x) -f{t) - (x-o/'(0 


(^-Q 

2 ! 


2 


■/"(O 


(x-t) 


n — 1 


(«-!)! 


(4.10) 


The function has the following properties: 

1 . t/f„(a) = 0and i//„(x) = 0. 

2. is a continuons function of t on [a, b]. 

3. The dérivative of ï//„( 0 with respect to t exists on (a, b). This dérivative 
is equal to 


n(x — ty ^ 

€{t) =g'n{t) + — 

(x — a) 

= -/'(O +nt)-{x-t)r{t) + (x-o/"(o 

(x — t) 

n{x-ty~^ 

^ 7Tï-gn{a) 


{n-l)\ 


{x — a) 


n — 1 


jx-t) 
{n-l)\ 


■/W(0 + 


n{x — t) 
(x — a) 


n — 1 


n 



110 


DIFFERENTIATION 


By applying Rolle’s theorem to on the interval [a, x] we can assert 
that there exists a value a < ^<x, such that = 0, that is, 


(n-l)! 




n{x-^y ^ 

\^g„(a)=0, 

{x — a) 


or 


(x-a) 

g„(a) = ^ ( 4 . 11 ) 

AZ « 


Using formula (4.10) in (4.11), we finally get 


/(x) =f{a) + {x-a)f{a) + f'{a) 

/ \W— 1 / 

ix — a) ix — a) 


This is known as Taylor' s formula. It can also be expressed as 


/(a +/*)=/( a) +/*/'( a) + -/"(«) 

+ - + 0 - 13 ) 

(n — 1) ! n\ 


where h = x — a and 0 < 6^^ < 1. □ 

In particular, if /(x) has dérivatives of ail orders in some neighborhood 
Nfa) of the point a, formula (4.12) can provide a sériés expansion of /(x) for 
x^Nfa) as The last term in formula (4.12), or formula (4.13), is 

called the remainder of Taylor’s sériés and is denoted by R^. Thus, 



(x — a) 
n\ 




h" 

n! 


/(")(« + 0 „/*). 



TAYLOR’S THEOREM 


111 


If ^ 0 as 


n 


n 


00 


then 


/(x)=/(fl)+E^ (4.14) 

n = i n\ 

or 

/(a+/î)=/(fl)+ i: (4.15) 

„ = i 

This results in what is known as Taylor's sériés. Thus the validity of Taylor’s 
sériés is contingent on having ^ 0 as n ^ oo, and on having dérivatives of 
ail orders for /(x) in N^{a). The existence of these dérivatives alone is not 
sufficient to guarantee a valid expansion. 

A spécial form of Taylor’s sériés is Maclauriri’s sériés, which results when 
a = 0. Formula (4.14) then reduces to 



In this case, the remainder takes the form 

R„="^r"Ke„x). 

ni 


(4.16) 


(4.17) 


The sum of the first n terms in Maclaurin’s sériés provides an approxima- 
tion for the value of f(x). The size of the remainder détermines how close 
the sum is to /(x). Since the remainder dépends on which lies in the 
interval (0, 1), an upper bound on that is free of will therefore be 
needed to assess the accuracy of the approximation. For example, let us 
consider the function /(x) = cos x. In this case. 


and 


y’(«)(x) = cos X + 


niT 


2 r 


n — 1 , 2 ,..., 


^(«>(0) = cos 


niT 


0, n odd, 

(_l)„/2^ 


n even. 



112 


DIFFERENTIATION 


Formula (4.16) becomes 


CO 


X 


2n 






X 


2n 


= 1 + +(-l) 7 ^ 

2! 4! ^ ^ (27î)! 


-^2n + l ’ 


where from formula (4.17) i^ 2 n + i 


R 


X 


2n + l 


2n + l 


(2/r + l)! 


cos 


^2n + l^ + 


(2tî + 1) 77 


An upper bound on |i^ 2 n + il given by 


R 


2n + l 


|x| 


2 rt + 1 


< 


(2/7+1)! 


Therefore, the error of approximating cos x with the sum 


^2« = 1 - 




+ 77--+(-l) 


n 


X 


2n 


2! 4! 


{2n)\ 


does not exceed \x\ /(2/7 + 1)!, where x is measured in radians. For 

example, if x = 7r/3 and n = 3, the sum 


x^ x"^ x^ 

^. = 1 + =0.49996 

® 2 4! 6! 


approximates cos(t 7/3) with an error not exceeding 


|x| 


7 


7! 


= 0.00027. 


The true value of cos(t 7/3) is 0.5. 


4.4. MAXIMA AND MINIMA OF A FUNCTION 

In this section we consider the problem of finding the extreme values of a 
function y =/(x) whose dérivative fXx) exists in any open set inside its 
domain of définition. 



MAXIMA AND MINIMA OF A FUNCTION 


113 


Définition 4.4.1, A function f:D^R has a local (or relative) maximum 
at a point Xq^D if there exists a ô>0 such that /(x)</(xq) for ail 
X e A^g(xo) n Z). The function / has a local (or relative) minimum at Xq if 
fix) >/(xq) for ail X e A^g(xo) H D. Local maxima and minima are referred to 
as local optima (or local extrema). □ 

Définition 4.4.2. A function f:D^R has an absolute maximum (mini- 
mum) over D if there exists a point x* such that /(x) </(x*)[/(x) > 
/(x*)] for ail X ^D. Absolute maxima and minima are called absolute optima 
(or extrema). □ 

The détermination of local optima of /(x) can be greatly facilitated if /(x) 
is différentiable. 

Theorem 4.4.1. Let /(x) be différentiable on the open interval (a, b). If 
fix) has a local maximum, or a local minimum, at a point Xq in (a, b), then 

/'(xo) = 0. 

Proof. Suppose that /(x) has a local maximum at Xq. Then, /(x)</(xq) 
for ail X in some neighborhood A^g(xo) c (a, b). It follows that 

f(x) -f(Xo) 


for ail X in Ngix^). As x^xj, the ratio in (4.18) will hâve a nonpositive 
limit, and if x^Xq, the ratio will hâve a nonnegative limit. Since /'(xq) 
exists, these two limits must be equal and equal to /'(xq) as x^Xq. We 
therefore conclude that /'(xq) = 0. The proof when /(x) has a local mini- 
mum is similar. □ 

It is important here to note that /'(xq) = 0 is a necessary condition for a 
différentiable function to hâve a local optimum at Xq. It is not, however, a 
sufficient condition. That is, if /'(xq) = 0, then it is not necessarily true that 
Xq is a point of local optimum. For example, the function /(x)=x^ has a 
zéro dérivative at the origin, but /(x) does not hâve a local optimum there 
(why not?). In general, a value Xq for which /'(xg) = 0 is called a stationary 
value for the function. Thus a stationary value does not necessarily corre- 
spond to a local optimum. 

We should also note that Theorem 4.4.1 assumes that /(x) is différen- 
tiable in a neighborhood of Xq. If this condition is not fulfilled, the theorem 
ceases to be true. The existence of /'(xg) is not prerequisite for /(x) to hâve 
a local optimum at Xg. In fact, /(x) can hâve a local optimum at Xg even if 
/'(xg) does not exist. For example, fix) = |x| has a local minimum at x = 0, 
but /'(O) does not exist. 


< 0 if X >Xg, 
> 0 if X <Xg, 


(4.18) 



114 


DIFFERENTIATION 


We recall from Corollary 3.4.1 that if f(x) is continuous on [a, b], then it 
must achieve ils absolute optima at some points inside [a, b]. These points 
can be interior points, that is, points that belong to the open interval (a, b), 
or they can be end (boundary) points. In particular, if fXx) exists on (a, b), 
to détermine the locations of the absolute optima we must solve the équation 
f'(x) = 0 and then compare the values of f(x) at the roots of this équation 
with f(a) and f(b). The largest (smallest) of these values is the absolute 
maximum (minimum). In the event f'(x) ^ 0 on (a, b), then f(x) must 
achieve its absolute optimum at an end point. 


4.4.1. A Sufficient Condition for a Local Optimum 

We shall make use of Taylor’s expansion to corne up with a sufficient 
condition for f(x) to hâve a local optimum at x=Xq. 

Suppose that f(x) has n dérivatives in a neighborhood such that 

/'(xq) =/"(xo) = ••• = 0, but ^ 0. Then by Taylor’s the- 

orem we hâve 


f{x) =/(Xo) + — /(">(Xo+ 0„/î) 

ni 

for any x in Ngix^), where h=x—XQ and O<0„< 1. Furthermore, if we 
assume that f^^Kx) is continuous at Xq, then 

/<")(Xo+0„/z)=/W(Xo)+o(l), 

where, as we recall from Section 3.3, o(l) ^ 0 as h ^ 0. We can therefore 
Write 


f(x) -/(Xo) = + o{h"). (4.19) 

In order for /(x) to hâve a local optimum at Xq, /(x) — /(xq) must hâve the 
same sign (positive or négative) for small values of h inside a neighborhood 
of 0. But, from (4.19), the sign of /(x) — /(xq) is determined by the sign of 
We can then conclude that if n is even, then a local optimum is 
achieved at Xq. In this case, a local maximum occurs at Xq if /^"Xxo)<0, 
whereas f^^Kx^) > 0 indicates that Xq is a point of local minimum. If, 
however, n is odd, then Xq is not a point of local optimum, since /(x) — /(xq) 
changes sign around Xq. In this case, the point on the graph of y =/(x) 
whose abscissa is Xq is called a saddle point. 

In particular, if /'(xq) = 0 and /"(xo)^O, then Xg is a point of local 
optimum. When /"(xq)<0, /(x) has a local maximum at Xg, and when 
/"(xg) > 0, /(x) has a local minimum at Xg. 



APPLICATIONS IN STATISTICS 


115 


Example 4.4.1. Let f(x) = 2x^ — 3x^ — I2x + 6. Then f'(x) = 6x^ — 6x 
— 12 = 0 at x= —1,2, and 

P'f \ = I — 6 = — 18 at X = — 1, 

^ “ \l8 atx = 2. 

We hâve then a local maximum at x= — 1 and a local minimum at x = 2. 

Example 4.4.2. /(x) =x"^ — 1. In this case, 

f'{x) = 4x^ = 0 at X = 0, 
f” (x) = 12x^ = 0 at X = 0, 

/'"(x) = 24x = 0 at X = 0, 
r^\x)=24. 

Then, x = 0 is a point of local minimum. 

Example 4.4.3. Consider /(x) = (x + 5)^(x^ — 10). We hâve 

/'(x) = 5(x + 5)(x — l)(x + 2)^, 

/"(x) = 10(x + 2)(2x^ + 8x — 1), 

/'"(x) = 10(6x^ + 24x+15). 

Here, f'(x) = 0 at x = — 5, — 2, and 1. At x = — 5 there is a local maximum, 
since /"( — 5) = —270 < 0. At x = 1 we hâve a local minimum, since /"(l) = 
270 ^ 0. However, at x — 2 a saddle point occurs, since ^ ( 2) — 0 and 

/"'(-2)= -90^0. 


Example 4.4.4. /(x) = (2x + l)/(x + 4), 0 <x < 5. Then 

/'(x)=7/(x+4)l 

In this case, f'(x) does not vanish anywhere in (0,5). Thus /(x) has no local 
maxima or local minima in that open interval. Being continuons on [0,5], 
/(x) must achieve its absolute optima at the end points. Since f'(x) > 0, /(x) 
is strictly monotone increasing on [0,5] by Corollary 4.2.1. Its absolute 
minimum and absolute maximum are therefore attained at x = 0 and x = 5, 
respectively. 


4.5. APPLICATIONS IN STATISTICS 

Differential calculus has many applications in statistics. Let us consider some 
of these applications. 



116 


DIFFERENTIATION 


4.5.1. Functions of Random Variables 


Let y be a continuous random variable whose cumulative distribution 
function is F(y) = P(Y <y). If F(y) is différentiable for ail y, then its 
dérivative F '(y) is called the density function of Y and is denoted by /(y). 
Continuous random variables for which f(y) exists are said to be absolutely 
continuous. 

Let y be an absolutely continuous random variable, and let W be another 
random variable which can be expressed as a function of Y of the form 
W= i//(y). Suppose that this function is strictly monotone and différentiable 
over its domain. By Theorem 3.5.1, ijj has a unique inverse which is also 
différentiable by Theorem 4.2.4. Let G(w) dénoté the cumulative distribution 
function of W. 

If ï// is strictly monotone increasing, then 

G{w) =P{W<w) =P[Y< =F[(//-1(w)]. 


If it is strictly monotone decreasing, then 

G{w) =P{W<w) =P[Y> = 1 -F[iIj-\w)]. 

By differentiating G(w) using the chain rule we obtain the density function 
g(w) for W, namely. 


g{w) = 


dp[4/ ^(w)] dij/ ^(w) 
dilj~^{w) dw 

f\ t-U 

=/[*/' (w)J- 


dw 


if i// is strictly monotone increasing, and 

dF\ijj~'^{w)\ diff~^{w) 

dw 


g{w) = - 


dijj ^(w) 


= \w)] 


dip ^(w) 


dw 


(4.20) 


(4.21) 


if ip is strictly monotone decreasing. By combining (4.20) and (4.21) we 
obtain 


g{w) =/[(A \w) 


dtp ^(w) 
dw 


(4.22) 


For example, suppose that Y has the uniform distribution Z7(0, 1) whose 
density function is 


\0 elsewhere. 



APPLICATIONS IN STATISTICS 


117 


Let W= — log y. Using formula (4.22), the density function of JV is given by 


g('^) 


e 0 < w < CO, 

0 elsewhere . 


The Mean and Variance ofW= ij/(Y) 

The mean and variance of the random variable W can be obtained by using 
its density function: 


-CO 

E(W)= wg{w)dw, 
— 00 


Var(lT) =E[W-E{W)] 


2 


/ CO 0 

[w—E{W)\ g{w)dw. 

— 00 


In some cases, however, the exact distribution of Y may not be known, or 
g(w) may be a complicated function to integrate. In such cases, approximate 
expressions for the mean and variance of W can be obtained by applying 
Taylor’s expansion around the mean of T, /x. If we assume that exists, 

then 


^{y) = + {y- /a) +o{y- 

If o{y — fji) is small enough, first-order approximations of E{W) and VarOT) 
can be obtained, namely, 

E{W) = /x), since E(y-/x) =0; Var(IT) /x)]^ (4.23) 

where o-^ = Var(y), and the symbol = dénotés approximate equality. If 
o{y — /x) is not small enough, then higher-order approximations can be 
utilized provided that certain dérivatives of ^{y) exist. For example, if i//'"(y) 
exists, then 

'Piy) = 'P{^^) + {y- m) + l(y-M)V"( +o[{y~ny . 

In this case, if o[{y — /x)^] is small enough, then second-order approximations 
can be obtained for E{W) and Var(IF) of the form 


E{W) — i//( /x) + /x), since E{Y — i±Ÿ = , 

Y 2 ix{W)^E{Q{Y)-E[Q{Y)\Ÿ, 



118 


DIFFERENTIATION 


where 


Q(Y) = ijj{fj,) + (Y- /j.) + j(Y- fif ip" { fj.) . 

Thus, 

VariW) = /.)]"+ H ( M)]'Var[(Y- /.f] 

+ PJ'{^Ji)ri^^)EliY-^Ÿ]. 

Variance Stabilizing Transformations 

One of the basic assumptions of régression and analysis of variance is the 
constancy of the variance o- ^ of a response variable Y on which experimental 
data are obtained. This assumption is often referred to as the assumption of 
homoscedasticity. There are situations, however, in which is not constant 
for ail the data. When this happens, Y is said to be heteroscedastic. 
Heteroscedasticity can cause problems and difficulties in connection with the 
statistical analysis of the data (for a survey of the problems of heteroscedas- 
ticity, see Judge et al, 1980). 

Some situations that lead to heteroscedasticity are (see Wetherill et al., 
1986, page 200): 

i. The Use of Averaged Data. In many experimental situations, the data 
used in a régression program consist of averages of samples that are 
different in size. This happens sometimes in survey analysis. 

ii. Variances Depending on the Explanatory Variables. The variance of an 
observation can sometimes dépend on the explanatory (or input) 
variables in the hypothesized model, as is the case with some econo- 
metric models. For example, if the response variable is household 
expenditure and one explanatory variable is household income, then 
the variance of the observations may be a function of household 
income. 

iii. Variances Depending on the Mean Response. The response variable Y 
may hâve a distribution whose variance is a function of its mean, that 
is, (T ^ = h{ /x), where /x is the mean of Y. The Poisson distribution, for 
example, has the property that = p. Thus as p changes (as a 
function of some explanatory variables), then so will a^. The following 
example illustrâtes this situation (see Chatterjee and Price, 1977, page 
39): Let Y be the number of accidents, and x be the speed of 
operating a lathe in a machine shop. Suppose that a linear relationship 
is assumed between Y and x of the form 

Y= Pq + PiX + €, 

where c is a random error with a zéro mean. Here, Y has the Poisson 
distribution with mean p = (5 q-\- /3^x. The variance of Y, being equal 
to P, will not be constant, since it dépends on x. 



APPLICATIONS IN STATISTICS 


119 


Heteroscedasticity due to dependence on the mean response can be 
removed, or at least reduced, by a suitable transformation of the response 
variable Y. So let us suppose that a^=h{fji). Let W= We need to 

find a proper transformation if/ that causes W to hâve almost the constant 
variance property. If this can be accomplished, then if/ is referred to as a 
variance stabilizing transformation. 

If the first-order approximation of Var(IF) by Taylor’s expansion is adé- 
quate, then by formula (4.23) we can select ijj so that 

= (4.24) 

where c is a constant. Without loss of generality, let c = I. A solution of 
(4.24) is given by 


dji 

7k 

Thus if W = i//(y), then Var(IF) will hâve a variance approximately equal to 
one. For example, if h( fi) = /x, as is the case with the Poisson distribution, 
then 



'/'( m) = / 


. dfi 

^(^)= J 

Hence, W = 2^/Ÿ will hâve a variance approximately equal to one. (In this 
case, it is more common to use the transformation W = }/Ÿ which has a 
variance approximately equal to 0.25). Thus in the earlier example regarding 
the relationship between the number of accidents and the speed of operating 
a lathe, we need to regress }/Ÿ against x in order to ensure approximate 
homosecdasticity. 

The relationship (if any) between and /x may be determined by 
theoretical considérations based on a knowledge of the type of data used — for 
example. Poisson data. In practice, however, such knowledge may not be 
known a priori. In this case, the appropriate transformation is selected 
empirically on the basis of residual analysis of the data. See, for example. Box 
and Draper (1987, Chapter 8), Montgomery and Peck (1982, Chapter 3). If 
possible, a transformation is selected to correct nonnormality (if the original 
data are not believed to be normally distributed) as well as heteroscedasticity. 
In this respect, a useful family of transformations introduced by Box and Cox 
(1964) can be used. These authors considered the power family of transfor- 
mations defined by 


4>{Y) = 


(y"-i)/A, 

logy, 


A ^ 0, 
A = 0. 



120 


DIFFERENTIATION 


This family may only be applied when Y bas positive values. Furthermore, 
since by rHospitabs rule 

yA_ 1 

lim = lim log Y = log Y, 

À^O A A^O 

the Box-Cox transformation is a continuons function of A. An estimate of A 
can be obtained from the data using the method of maximum likelihood (see 
Montgomery and Peck, 1982, Section 3.7.1; Box and Draper, 1987, Section 
8.4). 

Asymptotic Distributions 

The asymptotic distributions of fonctions of random variables are of spécial 
interest in statistical limit theory. By définition, a sequence of random 
variables converges in distribution to the random variable Y if 

MmF^{y)=F{y) 

n^oo 

at each point y where F(y) is continuons, where F„(y) is the cumulative 
distribution function of (n = 1, 2, . . . ) and F(y) is the cumulative distribu- 
tion function of Y (see Section 5.3 concerning sequences of fonctions). This 
form of convergence is denoted by writing 

d 

Y ^Y 

±n ^ • 

An illustration of convergence in distribution is provided by the well-known 
central limit theorem. It States that if is a sequence of independent and 

identically distributed random variables with common mean and variance, /x 
and respectively, that are both finite, and if is the sample 

mean of a sample size n, then as n ^ ao. 



where Z has the standard normal distribution A(0, 1). 

An extension of the central limit theorem that includes fonctions of 
random variables is given by the following theorem: 

Theorem 4.5.1. Let be a sequence of independent and identically 

distributed random variables with mean p and variance (both finite), and 
let Y^ be the sample mean of a sample of size n. If ij/iy) is a function whose 
dérivative il/ '(y) exists and is continuons in a neighborhood of p such that 



APPLICATIONS IN STATISTICS 


121 


^ 0, then as 7î ^ 00 , 


*A(L) - >Pi m) , 

Proof See Wilks (1962, page 259). □ 


On the basis of Theorem 4.5.1 we can assert that when n is large enough, 
is approximately distributed as a normal variate with a mean i//( fi) and 
a standard déviation (o-/ v^)| *//'( At)l • For example, if ijj(y)=y^, then as 

7Î ^ 00^ 



2|/x| 


a 



4.5.2. Approximating Response Functions 

Perhaps the most prévalent use of Taylor’s expansion in statistics is in the 
area of linear models. Let Y dénoté a response variable, such as the yield of 
a product, whose mean /x(x) is believed to dépend on an explanatory (or 
input) variable x such as température or pressure. The true relationship 
between /x and x is usually unknown. However, if /x(x) is considered to hâve 
dérivatives of ail orders, then it is possible to approximate its values by using 
low-order terms of a Taylor’s sériés over a limited range of interest. In this 
case, fji{x) can be represented approximately by a polynomial of degree d 
( > 1) of the form 


d 

/x(x) =/3o+ D 

; = i 

where /3 q, . . . , are unknown parameters. Estimâtes of these parame- 

ters are obtained by running n experiments in which n observa- 

tions, y^, 3^2? • • • î yn^ ori Y are obtained for specified values of x. This leads us 
to the linear model 


d 

= /= l,2,...,/î, (4.25) 

where is a random error. The method of least squares can then be used to 
estimate the unknown parameters in (4.25). The adequacy of model (4.25) to 
represent the true mean response /x(x) can be checked using the given data 
provided that replicated observations are available at some points inside the 
région of interest. For more details concerning the adequacy of fit of linear 
models and the method of least squares, see, for example. Box and Draper 
(1987, Chapters 2 and 3) and Khuri and Cornell (1996, Chapter 2). 



122 


DIFFERENTIATION 


4.5.3. The Poisson Process 

A random phenomenon that arises through a process which continues in time 
(or space) in a manner controlled by the laws of probability is called a 
stochastic process. A particular example of such a process is the Poisson 
process, which is associated with the number of events that take place over a 
period of time — for example, the arrivai of customers at a service counter, or 
the arrivai of a-rays, emitted from a radioactive source, at a Geiger counter. 

Define as the probability of n arrivais during a time interval of 

length t. For a Poisson process, the following postulâtes are assumed to hold: 

1. The probability of exactly one arrivai during a small time interval of 
length h is approximately proportional to h, that is, 

Pi{h) = A/z + o{h) 
as /z ^ 0, where A is a constant. 

2. The probability of more than one arrivai during a small time interval of 
length h is negligible, that is, 

E Pn{h)=o{h) 

n>l 

as /z ^ 0. 

3. The probability of an arrivai occurring during a small time interval 
(t, t h) does not dépend on what happened prior to t. This means that 
the events defined according to the number of arrivais occurring during 
nonoverlapping time intervals are independent. 

On the basis of the above postulâtes, an expression for p„{t) can be found 
as follows: For n>l and for small h we hâve approximately 

Pn(t + h) =P„(t)Po{h) +p„-i(t)Piih) 

=Pn(t)['^- ^h + o{h)] +p„_^{t)[Xh + o{h)], (4.26) 

since the probability of no arrivais during the time interval {t, t h) is 
approximately equal to 1 —p^{h). For n = 0 we hâve 


Po(t + h) =Po{t)Poih) 

= Po(0 [1 — + o(h)] . 


(4.27) 



APPLICATIONS IN STATISTICS 


123 


From (4.26) and (4.27) we then get for n > 1, 


Pn{t + h) -Pn{t) 

h 


= PniO 


-A + 


ojh) 

h 


+ Pn-liO 


A + 


h 


and for n = 0, 


Po{t + h) -pojt) 

h 


=Po(0 


— A + 


o{h) 

h 


By taking the limit as /z ^ 0 we obtain the dérivatives 


P'n{t) = -Ap„(0 + 

P'o(0 = - Vo(0- 

From (4.29) the solution for p^{t) is given by 

jPo(0 


n>l 


(4.28) 

(4.29) 


(4.30) 


since p^it) = 1 when t = 0 (that is, initially there were no arrivais). By 
substituting (4.30) in (4.28) when n = 1 we get 


P'i(0 = - Vi(0 + 


-A? 


If we now multiply the two sides of (4.31) by e^‘ we obtain 


(4.31) 


eVi (0 + ^Piit)e^‘ = \, 


Xt 


or 


[e^'pi(0]' = A. 


Hence, 


= Xt c, 


where c is a constant. This constant must be equal to zéro, since /?i(0) = 0. 
We then hâve 


Pi{t) = Xte 


— Xt 


By continuing in this process and using équation (4.28) we can find P2i0, 
then /?3(0, • • • , etc. In general, it can be shown that 


e-^^Xt) 


PniO = 


n 


ni 


7î = 0, 1, 2, 


» » » 


(4.32) 



124 


DIFFERENTIATION 


In particular, if t=l, then formula (4.32) gives the probability of n arrivais 
during one unit of time, namely, 


I ’ ^ 0,1,.... 

n\ 

This gives the probability mass function of a Poisson random variable with 
mean A. 


4.5.4. Minimizing the Sum of Absolute Déviations 

Consider a data set consisting of n observations yi, 3^2? • • • ? For an 
arbitrary real number a, let D{a) dénoté the sum of absolute déviations of 
the data from a, that is. 


n 

D{a) = J^\yi-a . 

i = l 

For a given a, D(a) represents a measure of spread, or variation, for the data 
set. Since the value of D(a) varies with a, it may be of interest to détermine 
its minimum. We now show that D(a) is minimized when a = /x*, where /x* 
dénotés the médian of the data set. By définition, /x* is a value that falls in 
the middle when the observations are arranged in order of magnitude. It is a 
measure of location like the mean. If we write ^3^(2) ^ ^y(n> ^or the 

ordered y/s, then when n is odd, /x"^ is the unique value y(„/ 2 +i/ 2 ) î whereas 
when n is even, /x* is any value such that y(„/ 2 > ^ ^y(n/ 2 +i>- ^Fe latter 

case, /X* is sometimes chosen as the middle of the interval. 

There are several ways to show that /x* minimizes D(a). The following 
simple proof is due to Blyth (1990): 

On the interval y^^^ < a <y(k+i), /: = 1,2, . . . , n — 1, we hâve 

n 

D{a)= E \y^i^-a\ 

i = l 

k n 

= E(«-y(o)+ E (yo)-«) 

i=l i=k+l 

k n 

= Æfl-E>’(o+ E y(i)-{n-k)a. 

i=\ i=k+l 

The function D{a) is continuons for ail a and is différentiable everywhere 
except at yi, y 2 ? • • • ? y^- For ai^^y- (/ = 1, 2, . . . , n), the dérivative D'{à) is 
given by 


D\a) =2{k-n/2). 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


125 


If k^n/2, then D'{à)i=^ on (}^(/;), 3^(/:+i)X and by Corollary 4.2.1, D{à) 
must be strictly monotone on [y(k), y(k+i)\ ^ = 1, 2, . . . , n — 1. 

Now, when n is odd, D{a) is strictly monotone decreasing for a < 
y(„/ 2 +i/ 2 ), because D'{a)<Q over (y(Æ), y(^+i)) for k<n/2. It is strictly 
monotone increasing for a >y(„/ 2 +i/ 2 )? because D'{a) > 0 over (^(yt), y(/;+i)) 
for k > n/2. Hence, /x* =y(„/ 2 +i/ 2 ) a point of absolute minimum for D{d). 
Furthermore, when n is even, D{a) is strictly monotone decreasing for 
a<y^^^ 2 p because D'{a)<{) over (^(yt), ^(yt+i)) for k<n/2. Also, D{à) is 
constant over (y(„/ 2 )? y(«/ 2 +i)X because D/d) = Q for k = n/2, and is strictly 
monotone increasing for a >y(„/ 2 +i) , because D'{a) > 0 over (^(yt), F(yt+i)) for 
k>n /2. This indicates that D(a) achieves its absolute minimum at any point 
/X* such that y(„/ 2 ) < ^F(n/ 2 +i)^ which complétés the proof. 


FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

Apostol, T. M. (1964). Mathematical Analysis. Addison-Wesley, Reading, Mas- 
sachusetts. (Chap. 5 discusses différentiation of functions of one variable.) 

Blyth, C. R. (1990). “Minimizing the sum of absolute déviations.” ylmcr. Statist., 44, 
329. 

Box, G. E. P., and D. R. Cox (1964). “An analysis of transformations.” /. Roy. Statist. 
Soc. Ser. B, 26, 211-243. 

Box, G. E. P., and N. R. Draper (1987). Empirical Model-Building and Response 
Surfaces. Wiley, New York. (Chap. 2 introduces the idea of approximating the 
mean of a response variable using low-order polynomials; Chap. 3 diseusses the 
method of least squares for fitting empirical models; the use of transformations, 
including those for stabilizing variances, is described in Chap. 8.) 

Box, G. E. P., W. G. Hunter, and J. S. Hunter (1978). Statistics for Experimenters. 
Wiley, New York. (Varions transformations are listed in Chap. 7, which include 
the Box-Cox and variance stabilizing transformations.) 

Buck, R. C. (1956). Advanced Calculas, McGraw-Hill, New York. (Chap. 2 discusses 
the mean value theorem and rHospital’s rule.) 

Chatterjee, S., and B. Price (1977). Régression Analysis by Example. Wiley, New York. 
(Chap. 2 includes a discussion concerning variance stabilizing transformations, in 
addition to détection and removal of the effeets of heteroscedasticity in régres- 
sion analysis.) 

Cooke, W. P. (1988). “L’HbpitaFsrule in a Poisson dérivation.” Hmcr. Math. Monthly, 
95, 253-254. 

Eggermont, P. P. B. (1988). “Noncentral différence quotients and the dérivative.” 
Amer. Math. Monthly, 95, 551-553. 

Eves, H. (1976). An Introduction to the History of Mathematics, 4th ed. Holt, Rinehart 
and Winston, New York. 

Fulks, W. (1978). Advanced Calculas, 3rd ed. Wiley, New York. (Différentiation is the 
subject of Chap. 4.) 



126 


DIFFERENTIATION 


Georgiev, A. A. (1984). “Kernel estimâtes of functions and their dérivatives with 
applications/’ Statist. Probab. Lett., 2, 45-50. 

Hardy, G. H. (1955). A Course of Pure Mathematics, lOth ed. The University Press, 
Cambridge, England. (Chap. 6 covers différentiation and provides some interest- 
ing examples.) 

Hogg, R. V., and A. T. Craig (1965). Introduction to Mathematical Statistics, 2nd ed. 
Macmillan, New York. (Chap. 4 discusses distributions of functions of random 
variables.) 

James, A. T., and R. A. J. Conyers (1985). “Estimation of a dérivative by a différence 
quotient: Its application to hépatocyte lactate metabolism.” Biométries, 41, 
467-476. 

Judge, G. G., W. E. Griffiths, R. C. Hill, and T. C. Lee (1980). The Theory and 
Practice of Econometrics. Wiley, New York. 

Khuri, A. L, and J. A. Cornell (1996). Response Surfaces, 2nd ed. Dekker, New York. 
(Chaps. 1 and 2 discuss the polynomial représentation of a response surface and 
the method of least squares.) 

Lindgren, B. W. (1976). Statistical Theory, 3rd ed. Macmillan, New York. (Section 3.2 
discusses the development of the Poisson process.) 

Menon, V. V., B. Prasad, and R. S. Singh (1984). “ Non-par ametric recursive estimâtes 
of a probability density function and its dérivatives.”/. Statist. Plann. Inference, 9, 
73-82. 

Montgomery, D. C., and E. A. Peck (1982). Introduction to Linear Régression Analysis. 
Wiley, New York. (Chap. 3 présents several methods useful for checking the 
validity of the basic régression assumptions. Several variance stabilizing transfor- 
mations are also listed.) 

Parzen, E. (1962). Stochastic Processes. Holden-Day, San Francisco. (Chap. 1 intro- 
duces the définition of stochastic processes including the Poisson process.) 

Roberts, A. W., and D. E. Varberg (1973). Convex Functions. Academie Press, New 
York. (Chap. 1 discusses a characterization of convex functions using dérivatives; 
Chap. 5 discusses maxima and minima of différentiable functions.) 

Roussas, G. G. (1973). A First Course in Mathematical Statistics. Addison-Wesley, 
Reading, Massachusetts. (Chap. 3 discusses absolutely continuous random vari- 
ables.) 

Rudin, W. (1964). Principles of Mathematical Analysis. 2nd ed. McGraw-Hill, New 
York. (Différentiation is discussed in Chap. 5.) 

Sagan, H. (1974). Advanced Calculas. Houghton Mifflin, Boston. (Chap. 3 discusses 
différentiation.) 

Wetherill, G. B., P. Duncombe, M. Kenward, J. Kôllerstrôm, S. R. Paul, and B. J. 
Vowden (1986). Régression Analysis with Applications. Chapman and Hall, Lon- 
don, England. (Section 9.2 discusses the sources of heteroscedasticity in régres- 
sion analysis.) 

Wilks, S. S. (1962). Mathematical Statistics. Wiley, New York. (Chap. 9 considers limit 
theorems including asymptotic distributions of functions of the sample mean.) 



EXERCISES 


127 


EXERCISES 
In Mathematics 

4.1. Let f(x) be defined in a neighborhood of the origin. Show that if /'(O) 
exists, then 


m-fi-h) 

lim 

h^o 2h 


=/'( 0 ). 


Give a counterexample to show that the converse is not true in general, 
that is, if the above limit exists, then it is not necessary that /'(O) exists. 

4.2. Let f(x) and g(x) hâve dérivatives up to order n on [a, b]. Let 
h{x) =f(x)g(x). Show that 


n / \ 

k = 0 ' 


(This is known as Leibniz’s formula.) 

4.3. Suppose that f(x) has a dérivative at a point Xq, a <Xq< b. Show that 
there exists a neighborhood N^ixo) and a positive number A such that 

|/(x) -f{xQ)\<A\x-Xo\ 

for ail x^Nq(xq), 

4.4. Suppose that f(x) is différentiable on (0, and f'(x) ^ 0 as x ^ 
Let g(x) =f(x + 1) -f(x). Prove that g(x) ^ 0 as x ^ co. 

4.5. Let the function f(x) be defined as 


/(^) 


x^ — 2x, X > 1, 
ax^ — ^x + 1 , X < 1 . 


For what values of a and b does /(x) hâve a continuons dérivative? 

4.6. Suppose that /(x) is twice différentiable on (0, oo). Let ttiq, 771 ^ 77^2 be 
the least upper bounds of |/(x)|, |/'(x)|, and |/"(x)|, respectively, on 

( 0 , 00). 



128 


DIFFERENTIATION 


(a) Show that 

nin 

\f'{x)\< — +hm 2 

for ail X in (0, ^o) and for every h > 0. 

(b) Deduce from (a) that 

mf < 4mo^2- 

4.7. Suppose that f'(x) exists. Does it follow that f(x) is différen- 

tiable at Xq? Give a proof to show that the statement is correct or 
produce a counterexample to show that it is false. 

4.8. Show that D{a) = — a\ has no dérivatives with respect to a at 

4.9. Suppose that the function f(x) is such that fXx) and f'(x) are 
continuons in a neighborhood of the origin and satisfies /(O) = 0. Show 
that 

= ^/"( 0 ). 

4.10. Show that if f'(x) exists and is bounded for ail x, then f(x) is 
uniformly continuons on R, the set of real numbers. 

4.11. Suppose that g: R and that |g'(x)| <M for ail x ^R, where M is 

a positive constant. Define f(x) =x cg(x), where c is a positive 
constant. Show that it is possible to choose c small enough so that / is 
a one-to-one function. 

4.12. Suppose that f{x) is continuons on [0, f'(x) exists on (0, /(O) = 0, 

and fXx) is monotone increasing on (0, co). Show that g(x) is mono- 
tone increasing on (0, oo) where g(x) =/(x)/x. 

4.13. Show that if a > 1 and m> 0, then 

(a) \im,^Sayx^) = ^, 

(b) lim^^ J(log xVx'”] = 0. 

4.14. Apply rHospital’s rule to find the limit 

I M"" 

lim 1 H — 

xj 


d \f{x) 

lim — 

x^o dx X 



EXERCISES 


129 


4 . 15 . (a) Find lim^^Q+(sin x)^. 

(b) Find \im^^Q+(e~^^^/x). 

4 . 16 . Show that 

lim [l + ax + o{x)Ÿ^^ = e^, 

x^O 

where a is a constant and o(x) is any function whose order of 
magnitude is less than that of x as x ^ 0. 

4 . 17 . Consider the functions /(x) = 4x^ + 6x^ — lOx + 2 and g(x) = 3x"^ + 
4x^ — 5x^ + 1. Show that 

f'{x) ^ /(l)-/(0) 
g’{x) g(l)-g(0) 

for any x e (0, 1). Does this contradict Cauchy’s mean value theorem? 

4 . 18 . Suppose that /(x) is différentiable for a <x < b. If f'(a) <f'(b) and y 
is a number such that f'(a)<y<f'(b), show that there exists a 

a < ^<b, for which f'(^)= y [a similar resuit holds if f'(a) >f'(b)]. 
[Hint: Consider the function g(x) =f(x) — y(x — a). Show that g(x) 
has a minimum at ^.] 

4 . 19 . Suppose that /(x) is différentiable on (a, b). Let x^, X 2 ,...,x„ be in 
(a, b), and let A 2 , . . . , A„ be positive numbers such that E”=i A^ = 1. 
Show that there exists a point c in (a, b) such that 

i:a,/'(x,)=/'(c). 

i = l 

[Note: This is a generalization of the resuit in Exercise 4.18.] 

4 . 20 . Let Xi, X 2 ,...,x„ and t)e in {a, b) such that x^ <3^/ (i = 

1, 2, ... , n). Show that if /(x) is différentiable on (a, b), then there 
exists a point c in (a, b) such that 

E [/(y,) -f(Xi)] =/'(c) E (yi-Xi). 

i=l i=l 


4 . 21 . Give a Maclaurin’s sériés expansion of the function /(x) = log(l +x). 

4 . 22 . Discuss the maxima and minima of the function/(x) = (x"^ + 3)/(x^ + 2). 



130 


DIFFERENTIATION 


4.23. Déterminé if f(x) = e^^/x has an absolute minimum on (0, ^o). 

4.24. For what values of a and b is the function 


/(^) = 


1 


x^ ^ ax + b 


bounded on the interval [—1,1]? Find the absolute maximum on that 
interval. 


In Statistics 


4.25. Let y be a continuons random variable whose cumulative distribution 
function, F(y), is strictly monotone. Let G(y) be another strictly 
monotone, continuons cumulative distribution function. Show that the 
cumulative distribution function of the random variable G“^[F(y)] is 
G(y). 

4.26. Let Y hâve the cumulative distribution function 


Hy) = 


l-e-y, y>0. 


0 , 


y < 0. 


Find the density function oî W = ^/Ÿ. 


4.27. Let y be normally distributed with mean 1 and variance 0.04. Let 
W=Y\ 

(a) Find the density function of W. 

(b) Find the exact mean and variance of W. 

(c) Find approximate values for the mean and variance of W using 
Taylor’s expansion, and compare the results with those of (b). 


4.28. Let Z be normally distributed with mean 0 and variance 1. Let Y 
Find the density function of Y. 

[Note: The function ï//(z) =z^ is not strictly monotone for ail z.] 


= Zl 


4.29. Let X be a random variable that dénotés the âge at failure of a 
component. The failure rate is defined as the probability of failure in a 
finite interval of time, say of length h, given the âge of the component, 
say X. This failure rate is therefore equal to 

P(x <X <x h\X>x) . 

Consider the following limit: 



EXERCISES 


131 


If this limit exists, then it is called the hazard rate, or instantaneous 
failure rate. 

(a) Give an expression for the failure rate in terms of F(x), the 
cumulative distribution function of X 

(b) Suppose that X has the exponential distribution with the cumula- 
tive distribution function 

F(x) = l-c-^/^ x>0, 

where o- is a positive constant. Show that X has a constant hazard 
rate. 

(c) Show that any random variable with a constant hazard rate must 
hâve the exponential distribution. 

4.30. Consider a Poisson process with parameter A over the interval (0, 0- 
Divide this interval into n equal subintervals of length h=t/n. We 
consider that we hâve a “success” in a given subinterval if one arrivai 
occurs in that subinterval. If there are no arrivais, then we consider that 
we hâve a “failure.” Let dénoté the number of “successes” in the n 
subintervals of length h. Then we hâve approximately 

= PnC^-PnŸ r = 0, 


where is approximately equal to \h = kt/n. Show that 


lim P(y„=r) = 

n ^co 


e-^\kty 



CHAPTER 5 


Infinité Sequences and Sériés 


The study of the theory of infinité sequences and sériés is an intégral part of 
advanced calculus. Ail limiting processes, such as différentiation and intégra- 
tion, can be investigated on the basis of this theory. 

The first example of an infinité sériés is attributed to Archimedes, who 
showed that the sum 


1 

! + -+••• + 
4 



was less than | for any value of n. However, it was not until the nineteenth 
century that the theory of infinité sériés was firmly established by Augustin- 
Louis Cauchy (1789-1857). 

In this chapter we shall study the theory of infinité sequences and sériés, 
and investigate their convergence. Unless otherwise stated, the terms of ail 
sequences and sériés considered in this chapter are real-valued. 


5.1. INFINITE SEQUENCES 

In Chapter 1 we introduced the general concept of a function. An infinité 
sequence is a particular function /: defined on the set of ail positive 

integers. For a given n the value of this function, namely fin), is called 
the nth term of the infinité sequence and is denoted by The sequence 
itself is denoted by the symbol In some cases, the integer with which 

the infinité sequence begins is different from one. For example, it may be 
equal to zéro or to some other integer. For the sake of simplicity, an infinité 
sequence will be referred to as just a sequence. 

Since a sequence is a function, then, in particular, the sequence 
can hâve the following properties: 

1. It is bounded if there exists a constant K>0 such that \aj <K for 
ail n. 


132 



INFINITE SEQUENCES 


133 


2. It is monotone increasing if + i for ail n, and is monotone 

decreasing if for ail 7î. 

3. It converges to a finite number c if lim„^^ = c, that is, for a given 
6 > 0 there exists an integer N such that 




c < € 


if n > 


In this case, c is called the limit of the sequence and this fact is 
denoted by writing ^ c as n ^ If the sequence does not converge 
to a finite limit, then it is said to be divergent. 

4. It is said to oscillate if it does not converge to a finite limit, nor to + oo 
or — 00 as n ^ 00 . 


Example 5.1. 1. Let a^ = {n^ -\-2n)/{2n ^ Then 
since 

1 + 2/n 

lim a.= lim 


as n ^ 00 ^ 


rt^oo ” 2 + 3//r^ 

I 


Example 5.1.2. Consider = /n + I — ^^n . This sequence converges to 
zéro, since 

{y/n + I — y/n){y/n + I + y/n) 

I 

y/ n + I + yfn 


Hence, ^ 0 as n ^ oo. 

Example 5.1.3. Suppose that a„ = 2”//r^. Here, the sequence is diver- 
gent, since by Example 4.2.3, 

2 ” 

lim — 7 = oo. 
n 


Example 5.1.4. Let = (—1)”. This sequence oscillâtes, since it is equal 
to I when n is even and to — I when n is odd. 

Theorem 5.1.1. Every convergent sequence is bounded. 

Proof. Suppose that {aJ2=i converges to c. Then, there exists an integer 
N such that 



<1 ifn>A^. 



134 


INFINITE SEQUENCES AND SERIES 


For such values of n, we hâve 

I < max( | c — 1 1 , | c + 1) . 

It follows that 

\aj <K 

for ail 7î, where 

K = max(|ail + 1, 1^21 + 1? • • • ? I^atI + iîk~iUk+i|)- n 

The converse of Theorem 5.1.1 is not necessarily true. That is, if a 
sequence is bounded, then it does not hâve to be convergent. As a counterex- 
ample, consider the sequence given in Example 5.1.4. This sequence is 
bounded, but is not convergent. To guarantee converge of a bounded 
sequence we obviously need an additional condition. 

Theorem 5.1.2. Every bounded monotone (increasing or decreasing) se- 
quence converges. 

Proof Suppose that is a bounded and monotone increasing se- 

quence (the proof is similar if the sequence is monotone decreasing). Since 
the sequence is bounded, it must be bounded from above and hence has a 
least upper bound c (see Theorem 1.5.1). Thus a„<c for ail n. Furthermore, 
for any given 6 > 0 there exists an integer N such that 

c — € <c; 

otherwise c—e would be an upper bound of {aJn = i‘ Now, because the 
sequence is monotone increasing, 

c € < ^N+1 — ^N+2 — ’ — 

that is, 

c — €<a^<c for n> N. 

We can write 

c — €<a^<c 

or equivalently, 

\a^ — c\ < € iin>N. 

This indicates that converges to c. □ 



INFINITE SEQUENCES 


135 


Using Theorem 5.1.2 it is easy to prove the following corollary. 

Corollary 5.1.1. 

1. If {ctJn = i bounded from above and is monotone increasing, then 

^^Jn = i converges to c = sup„>i 

2. If is bounded from below and is monotone decreasing, then 

{a„}n = i converges to d = inf„> ^ 

Example 5.1.5. Consider the sequence where a ^ = }/2 and 

= ^Ja^ for /î > 1. This sequence is bounded, since 2 for ail n, as 
can be easily shown using mathematical induction: We hâve < 2. If 

<2, then <^J2-\- ^|2 <2. Furthermore, the sequence is monotone 
increasing, since + i for 7î = 1,2, ..., which can also be shown by 

mathematical induction. Hence, by Theorem 5.1.2 must converge. To 

find its limit, we note that 


lim , 1 = lim 




+ 


a/*' 


lim a 


n 


■ 00 


If c dénotés the limit of as n ^ then 

c = -^2 + ^/c . 

By solving this équation under the condition c > we find that the only 
solution is c = 1.831. 

Définition 5.1.1, Consider the sequence An infinité collection of 

its terms, picked out in a manner that préservés the original order of the 
terms of the sequence, is called a subsequence of {ciJn=i‘ More formally, any 
sequence of the form where such that < Æ 2 < ••• <k^< 

••• is a subsequence of Note that > n for n > 1. □ 

Theorem 5.1.3. A sequence converges to c if and only if every 

subsequence of {aXi = i converges to c. 

Proof The proof is left to the reader. □ 

It should be noted that if a sequence diverges, then it does not necessarily 
follow that every one of its subsequences must diverge. A sequence may fail 
to converge, yet several of its subsequences converge. For example, the 



136 


INFINITE SEQUENCES AND SERIES 


sequence whose nth tenu is = ( — !)” is divergent, as was seen earlier. 
However, the two subsequences and where =a. 2 n = 1 

c„ = Ü 2 n-i = — 1 (n = 1, 2, . . . ), are both convergent. 

We hâve noted earlier that a bounded sequence may not converge. It is 
possible, however, that one of its subsequences is convergent. This is shown 
in the next theorem. 

Theorem 5.1.4. Every bounded sequence has a convergent subsequence. 

Proof. Suppose that is a bounded sequence. Without loss of 

generality we can consider that the number of distinct terms of the sequence 
is infinité. (If this is not the case, then there exists an infinité subsequence of 
i^Jn = i Ihat consists of terms that are equal. Obviously, such a subsequence 
converges.) Let G dénoté the set consisting of ail terms of the sequence. 
Then G is a bounded infinité set. By Theorem 1.6.2, G must hâve a limit 
point, say c. Also, by Theorem 1.6.1, every neighborhood of c must contain 
infinitely many points of G. It follows that we can find integers k^<k 2 < 

< ••• such that 


ük —c 


1 

< — for = 1,2, ... . 
n 


Thus for a given e > 0 there exists an integer N > l/e such that \ük-c\ <e 
if n>N. This indicates that the subsequence converges to c. 

□ 


We conclude from Theorem 5.1.4 that a bounded sequence can hâve 
several convergent subsequences. The limit of each of these subsequences is 
called a subsequential limit. Let E dénoté the set of ail subsequential limits 
of {(in}n = i‘ This set is bounded, since the sequence is bounded (why?). 

Définition 5.1.2. Let be a bounded sequence, and let E be the 

set of ail its subsequential limits. Then the least upper bound of E is called 
the upper limit of and is denoted by limsup„^oo Similarly, the 

greatest lower bound of E is called the lower limit of and is denoted 

by liminf„^^ For example, the sequence where = (— 1)”[1 + 

(1/tî)], has two subsequential limits, namely —1 and 1. Thus £' = {—1,1}, 
and limsup^^^a„ = 1, liminf^^^ a^= -1. □ 

Theorem 5.1.5. The sequence converges to c if any only if 

liminfa^ = limsupa^ =c. 

Proof. The proof is left to the reader. □ 



INFINITE SEQUENCES 


137 


Theorem 5.1.5 implies that when a sequence converges, the set of ail its 
subsequential limits consists of a single element, namely the limit of the 
sequence. 


5.1.1. The Cauchy Criterion 

We hâve seen earlier that the définition of convergence of a sequence 
requires finding the limit of as n ^ oo. in some cases, such a limit may be 
difficult to figure out. For example, consider the sequence whose nth term is 

1 1 1 (- 1 )""^ 

3 ^ 5 “ 7 ^ ^ 2n-l ’ ” = (5.1) 

It is not easy to calculate the limit of in order to find out if the sequence 
converges. Fortunately, however, there is another convergence criterion for 
sequences, known as the Cauchy criterion after Augustin-Louis Cauchy (it was 
known earlier to Bernhard Bolzano, 1781-1848, a Czechoslovakian pries! 
whose mathematical work was undeservedly overlooked by his lay and cléri- 
cal contemporaries; see Boyer, 1968, page 566). 


Theorem 5.1.6 (The Cauchy Criterion). The sequence converges 

if and only if it satisfies the following condition, known as the e-condition: 
For each e> 0 there is an integer N such that 

a^—aj < € for ail m > N, n > N. 

Proof Necessity: If the sequence converges, then it must satisfy the e-con- 
dition. Let e> 0 be given. Since the sequence converges, then there 

exists a number c and an integer N such that 

€ 

a„— c\ < — it n> N. 

n 2 

Hence, for m> N, n > A we must hâve 

la^-aj = la^-c-hc-aj 

< \a^ — c| + \a^ — c| <6. 

Sujficiency: If the sequence satisfies the e-condition, then it must converge. 
If the e-condition is satisfied, then there is an integer N such that for any 
given 6 > 0, 

for ail values of n>N-\- 1. Thus for such values of n. 




138 


INFINITE SEQUENCES AND SERIES 


The sequence is therefore bounded, since from the double inequality 

(5.2) we can assert that 

\aj < max(|aj + 1, 1^21 \ ^n\ + \ ^n+i ~ ^1 ? I^a^+i ) 

for ail 7î. By Theorem 5.1.4, has a convergent subsequence 

Let c be the limit of this subsequence. If we invoke again the 6-condition, we 

can find an integer N' such that 

cim~^k„\ if m > A/^', >7î > A^', 

where < e. By fixing m and letting ^ ao we get 

\a^—c\<€'<€ if m> N' . 

This indicates that the sequence is convergent and has c as its limit. 

□ 


Définition 5.1.3. A sequence that satisfies the ^-condition of the 

Cauchy criterion is said to be a Cauchy sequence. □ 


Example 5.1.6. With the help of the Cauchy criterion it is now possible 
to show that the sequence whose nth term is defined by formula (5.1) 

is a Cauchy sequence and is therefore convergent. To do so, let m> n. Then, 


«[ 1 1 (-!)'’■' 

ü ~ O = (~1) h • • • H , 

^ ^ 27Î+1 27Î + 3 2/Î + 2/7-1 


(5.3) 


where p = m — n. We daim that the quantity inside brackets in formula (5.3) 
is positive. This can be shown by grouping successive terms in pairs. Thus if p 
is even, the quantity is equal to 



\ 2tî + 1 2tî + 3 / \ 2tî + 5 2tî + 7 , 


\2n 2p — 3 2n 2p — 1 j' 

which is positive, since the différence inside each parenthesis is positive. If 
/7 = 1, the quantity is obviously positive, since it is then equal to l/{2n + 1). 
If /7 > 3 is an odd integer, the quantity can be written as 

‘ 

\2tî + 1 2tî + 3/ \2/î + 5 2n -\-1 ^ 

il 1 \ 1 

+ I — + , 

\2n-\-2p — 5 2n + 2p — 3j 2n + 2p — l 



INFINITE SEQUENCES 


139 


which is also positive. Hence, for any p, 




1 

2/î + 1 


1 

2tî + 3 


+ ••• + 


2n 2p — 1 


(5.4) 


We now daim that 




To prove this daim, let us again consider two cases. If p is even, then 



1 

f 1 

1 \ 

2tî + 1 

\2n + 3 

27î + 5 , 


1 


1 


2n-\-2p — 5 2n -h 2p — 3 ) 


1 

2n + 2p — 1 


< 


1 

2/Î + r 


(5.5) 


since ail the quantities inside parenthèses in (5.5) are positive. If p is odd, 
then 


1 


1 


1 


^m-^n = 


2tî + 1 \ 2tî + 3 2tî + 5 

1 1 


\2n 2p — 3 2n 2p — 1 j 


^ 1 

< 


2n + r 


which proves our daim. On the basis of this resuit we can assert that for a 
given 6 > 0, 




where N is such that 


1 


2N+ 1 


< 


or equivalently, 


N> 


1 1 
2^~2 


This shows that {aJn = i is a Cauchy sequence. 



140 


INFINITE SEQUENCES AND SERIES 


Example 5.1.7. Consider the sequence where 

«„ = (-i)”fi + - ■ 

\ ^ . 

We hâve seen earlier that a^ = —1 and limsup^^^ = 1. Thus 

by Theorem 5.1.5 this sequence is not convergent. We can arrive at the same 
conclusion using the Cauchy criterion by showing that the 6-condition is not 
satisfied. This occurs whenever we can find an e > 0 such that for however N 
may be chosen, 


— a 


m 


n 


> € 


for some m> N, n> N.ln our example, if N is any positive integer, then the 
inequality 


>2 


m 


n 


(5.6) 


can be satisfied by choosing m = v and n = v + 1, where v is an odd integer 
greater than N. 


5.2. INFINITE SERIES 

Let be a given sequence. Consider the symbolic expression 


X) = ^1 + ^2 + ••• + ••* . (5.7) 

n = l 

By définition, this expression is called an infinité sériés, or just a sériés for 
simplicity, and is referred to as the n\h term of the sériés. The finite sum 


n 

^ ^ 1 î 2 î • • • î 

i = l 

is called the nth partial sum of the sériés. 

Définition 5.2.1, Consider the sériés Let be its nth partial 

sum (tî = 1,2, ...). 

1. The sériés is said to be convergent if the sequence converges. In 

this case, if lim„^^ = 5 , where 5 is finite, then we say that the sériés 
converges to 5, or that 5 is the sum of the sériés. Symbolically, this is 



INFINITE SERIES 


141 


expressed by writing 

CO 

^ = E ««• 

n = l 

2. If does not tend to a finite limit, then the sériés is said to be 
divergent. □ 

Définition 5.2.1 formulâtes convergence of a sériés in terms of conver- 
gence of the associated sequence of its partial sums. By applying the Cauchy 
criterion (Theorem 5.1.6) to the latter sequence, we arrive at the following 
condition of convergence for a sériés: 

Theorem 5.2.1. The sériés converges if and only if for a given 

6 > 0 there is an integer N such that 

n 

X) for ail 7î > m > A^. (5-8) 

i = m + l 

Inequality (5.8) follows from applying Theorem 5.1.6 to the sequence 
of partial sums of the sériés and noting that 

n 

^ for n> m, 

i = m + l 

In particular, if n =m + 1, then inequality (5.8) becomes 

l«m + il<e (5-9) 

for ail m> N. This implies that lim^^^ ^m + i ^ hence lim„^^ = 0. 

We therefore conclude the following resuit: 

Result 5.2.1. If = is a convergent sériés, then lim„^^ = 0. 

It is important here to note that the convergence of the nth term of a 
sériés to zéro as n ^ oo is a necessary condition for the convergence of the 
sériés. It is not, however, a sufficient condition, that is, if lim„^^ = 0, then 
it does not follow that converges. For example, as we shall see later, 

the sériés E^ = i(1/7î) is divergent, and its nth term goes to zéro as n ^ It 
is true, however, that if lim„^^ # 0, then is divergent. This follows 

from applying the law of contraposition to the necessary condition of conver- 
gence. We conclude the following: 

1. If as n ^ then no conclusion can be reached regarding 

convergence or divergence of = 




142 


INFINITE SEQUENCES AND SERIES 


2. If 0 as /î ^ then = is divergent. For example, the sériés 
= + 1)] is divergent, since 


n 

lim = 1 =7^ 0. 

n^oo /î + 1 


Example5.2.1. One of the simplest sériés is the géométrie sériés, = 

This sériés is divergent if \a\ >1, since lim„^^ a” #0. It is convergent if 
\a\ < 1 by the Cauchy criterion: Let n> m. Then 


By multiplying the two sides of (5.10) by a, we get 

a{s^-sj=a^^^ + ••• + ^ 

If we now subtract (5.11) from (5.10), we obtain 

„m + 1 ^n + 1 


(5.10) 


(5.11) 


(5.12) 


Since a 


< 1, we can find an integer N such that iox m> N, n> N, 


m + 


/f !> I X ^ 

a < 


1 , e(l-«) 


„ + i , e(l-fl) 


X 

a < 


Hence, for a given c > 0, 

I I < ^ n>m> N. 

Formula (5.12) can actually be used to find the sum of the géométrie sériés 
when \a\ <1. Let m = 1. By taking the limits of both sides of (5.12) as n ^ oo 
we get 


lim s^=Si + 

n^oo 



1 — a 


? 


a 


= ü -\- 


1 — a 


a 


since 


lim + i =0, 

^co 


1 — a 



INFINITE SERIES 


143 


Example 5.2.2. Consider the sériés E“=i(l//r!). This sériés converges by 
the Cauchy criterion. To show this, we first note that 

/î!=/î(/r — l)(7î — 2) X ••• X3x2xl 
> 2”“^ for 7î = 1, 2, . . . . 

Hence, for n> m, 



n 

= 2 E 

Z = m + 1 



This is a partial sum of a convergent géométrie sériés with a = \<l [see 
formula (5.10)]. Consequently, — 5^1 can be made smaller than any given 
6 > 0 by choosing m and n large enough. 

Theorem 5.2.2. If and are two convergent sériés, and if c 

is a constant, then the following sériés are also convergent: 

2. i:“=i(a„ + ^„) = i:“=ia„ + i:“=i^„. 

Proof. The proof is left to the reader. □ 

Définition 5.2.2. The sériés = is absolutely convergent if 
is convergent. □ 

For example, the sériés E“=i[(— l)”//r!j is absolutely convergent, since 
S^ = i(l/?î0 is convergent, as was seen in Example 5.2.2. 

Theorem 5.2.3. Every absolutely convergent sériés is convergent. 

Proof. Consider the sériés and suppose that is conver- 

gent. We hâve that 


E ^ E l«,l- 


(5.13) 


i = m + l 


i = m + l 



144 


INFINITE SEQUENCES AND SERIES 


By applying the Cauchy criterion to 
that for a given 6 > 0, 



we can find an integer N such 


i: 

i = m + l 



< € 


if n> m> N. 


(5.14) 


From (5.13) and (5.14) we conclude that = satisfies the Cauchy crite- 
rion and is therefore convergent by Theorem 5.2.1. □ 


Note that it is possible that is convergent while is 

divergent. In this case, the sériés = is said to be conditionally conver- 
gent. Examples of this kind of sériés will be seen later. 

In the next section we shall discuss convergence of sériés whose terms are 
positive. 


5.2.1. Tests of Convergence for Sériés of Positive Ternis 

Suppose that the terms of the sériés are such that a^> 0 for n> K, 

where À' is a constant. Without loss of generality we shall consider that 
K = 1. Such a sériés is called a sériés of positive terms. 

Sériés of positive terms are interesting because the study of their conver- 
gence is comparatively simple and can be used in the détermination of 
convergence of more general sériés whose terms are not necessarily positive. 
It is easy to see that a sériés of positive terms diverges if and only if its sum 
is +CO. 

In what follows we shall introduce techniques that simplify the process of 
determining whether or not a given sériés of positive terms is convergent. We 
refer to these techniques as tests of convergence. The advantage of these 
tests is that they are in general easier to apply than the Cauchy criterion. 
This is because evaluating or obtaining inequalities involving the expression 
'L^=m + i^i iri Theorem 5.2.1 can be somewhat difficult. The tests of conver- 
gence, however, hâve the disadvantage that they can sometime fail to 
déterminé convergence or divergence, as we shall soon find out. It should be 
remembered that these tests apply only to sériés of positive terms. 

The Comparison Test 

This test is based on the following theorem: 

Theorem 5.2.4. Let and be two sériés of positive terms 

such that an<K for n>N^, where A^q ^ fixed integer. 

i. If = converges, then so does = 

ii. If = is divergent, then = is divergent too. 



INFINITE SERIES 


145 


Proof We hâve that 

n n 

o.i< Z) for7î>m>A^o- (5.15) 

i = m + l i = m + l 

If is convergent, then for a given e > 0 there exists an integer such 

that 

n 

b-<€ for7î>m>A^P (5.16) 

i = m + l 

From (5.15) and (5.16) it follows that if n> m>N, where N = max(A^Q, A^^), 
then 

n 

E «i < e, 

/ = m + 1 

which proves (i). 

The proof of (ii) follows from applying the law of contraposition to (i). 

□ 


To détermine convergence or divergence of we thus need to hâve 

in our répertoire a collection of sériés of positive terms whose behavior 
(with regard to convergence or divergence) is known. These sériés can then 
be compared against For this purpose, the following sériés can be 

useful: 


a. E“=il//r. This is a divergent sériés called the harmonie sériés. 

b. E“=il//r^. This is divergent if k < 1 and is convergent if Æ > 1. 


To prove that the harmonie sériés is divergent, let us consider its nth 
partial sum, namely, 

« 1 

= Ë 7 • 

/ = ! ^ 


Let > 0 be an arbitrary positive number. Choose n large enough so that 
n> 2^, where m> 2 A. Then for such values of n. 


n 


/ M 


i 1 

n 


fl 1 1 1 \ 

1 + - 

+ 

: - + - 

+ 

1 1 h 1 

l 2j 


13 



i 5 6 7 8 j 


+ 


1 


1 


+ 


I m — 1 


+ 1 


+ ••• + 


1 2 4 

> h h h 

2 4 8 


+ 


2m 

i m — 1 

t 

O m 


m 


(5J7) 



146 


INFINITE SEQUENCES AND SERIES 


Since A is arbitrary and is a monotone increasing function of n, inequality 
(5.17) implies that ^ co as n ^ This proves divergence of the harmonie 
sériés. 

Let us next consider the sériés in (b). If k<l, then 1/7 î^>1/7î and 
£^ = i(l/?î^) rnust be divergent by Theorem 5.2.4(ii). Suppose now that k> 1. 
Consider the nth partial sum of the sériés, namely, 




Then, by choosing m large enough so that 2^ > n we get 


2 ”^-! I 


i=l 


= 1 + 


/ 1 

1 ^ 

1 1 

1 

1 1 \ 


+ —T 

+ J “h 



ri 

\ 4 '^ 

5"^ 

/ 7 

6^= vU 


+ 


1 


1 


+ 


+ ••• + 


(2m-l) 


(2"^-l) 


< 1 + 


/ 1 

1 ^ 

/ 1 

1 

1 

1 \ 


-h -r + 


+ , 

+ . 

\2^ 

2'^ j 

\ 4* 

4 k 

4 k 

4^j 


+ 


+ 


1 


1 


+ ••• + 


(2m-l) 


2 4 

1 H T H T + 


+ 


(2m-l) 

2m-l 

(2m-l)^ 


m 


i = l 


(5.18) 


where a = But the right-hand side of (5.18) represents the mth 

partial sum of a convergent géométrie sériés (since a < 1). Hence, as m ^ oo, 
the right-hand side of (5.18) converges to 


CO 




1 


i=l 


1 — a 


(see Example 5.2.1). 


Thus the sequence {■^^^=1 is bounded. Since it is also monotone increasing, it 
must be convergent (see Theorem 5.1.2). This proves convergence of the 
sériés E“=i(l//r^) for Æ > 1. 



INFINITE SERIES 


147 


Another version of the comparison test in Theorem 5.2.4 that is easier to 
implement is given by the following theorem: 

Theorem 5.2.5. Let 'LZ=i^n = two sériés of positive terms. If 

there exists a positive constant / such that and Ib^ are asymptotically 
equal, ^ as n ^ oo (see Section 3.3), that is, 

lim — =1, 


then the two sériés are either both convergent or both divergent. 
Proof There exists an integer N such that 


a 


n 


-i 


n 



it n> N, 


or equivalently. 


/ 31 



whenever n> N. 


If = is convergent, then = is convergent by a combination of 
Theorem 5.2.2(I) and 5.2.4(i), since b^<{2/l)a^. Similarly, if = con- 
verges, then so does = since a^<(3l/2)b^. If 'LZ = i^n is divergent, 
then = is divergent too by a combination of Theorems 5. 2.2(1) and 

5.2.4(ii), since b^ >(2/3/)a„. Finally, = diverges if the same is true of 
K = since > (l/2)b^. □ 

Example 5.2.3. The sériés + 2)/{n^ + 2/r + I) is convergent, since 


n 2 I 

^ 

2n 1 


as n 


00 

? 


which is the nth term of a convergent sériés [recall that E“ = i(l//r^) is 
convergent if Æ > I]. 

Example 5.2.4. E“=il/-\/7r(7r + 1) is divergent, because 

I I 

. as 7Î — > co^ 

yn(n + 1) ^ 


which is the nth term of the divergent harmonie sériés. 



148 


INFINITE SEQUENCES AND SERIES 


The Ratio or dAlemberTs Test 

This test is usually attributed to the French mathematician Jean Baptiste 
d’Alembert (1717-1783), but is also known as Cauchy’s ratio test after 
Augustin-Louis Cauchy (1789-1857). 

Theorem 5.2.6. Let = be a sériés of positive terms. Then the 
following hold: 

1 . The sériés converges if limsup^^ Ja^ + i/a„) < 1 (see Définition 5.1.2). 

2. The sériés diverges if liminf„^ + > 1 (see Définition 5.1.2). 

3 . If liminf„^ + < 1 < limsup„^ + no conclusion can 

be made regarding convergence or divergence of the sériés (that is, the 
ratio test fails). 

In particular, if + exists, then the following hold: 

1 . The sériés converges if r < 1. 

2. The sériés diverges if r > 1. 

3 . The test fails if r = 1. 


Proof. Let p = liminf„^ia„ + i/a„), <? = limsup„^o,(a„ + i/fl„). 

1 . If 1, then by the définition of the upper limit (Définition 5.1.2), 
there exists an integer N such that 

^^<q' îorn>N, (5.19) 


where q' is chosen such that ^ < ^' < 1. (If cin + \/^n infinitely 

many values of n, then the sequence {o-n + i/^îXi = i has a subsequential 
limit greater than or equal to which exceeds q. This contradicts the 
définition of q.) From (5.19) we then get 


^N+l ^ 
^N+2 




im 


where m>l. Thus for n>N, 


an<aNq'^’' = ^q'". 



INFINITE SERIES 


149 


Hence, the sériés converges by comparison with the convergent géomét- 
rie sériés since q' <1. 

2. If;, > 1, then in an analogous manner we can find an integer N such 
that 

^^>p' îorn>N, (5.20) 

where p' is chosen such that p>p' > 1. But this implies that cannot 
tend to zéro as n ^ and the sériés is therefore divergent by Resuit 
5.2.1. 

3. If;, < 1 <q, then we can demonstrate by using an example that the 
ratio test is inconclusive: Consider the two sériés E“=i(1/7î), 
£^ = i(l/?î^)- For both sériés, p = q = l and hence p<l<q, since 

+ = F Fut the first sériés is divergent while the second 

is convergent, as was seen earlier. □ 

Example 5.2.5. Consider the same sériés as in Example 5.2.2. This sériés 
was shown to be convergent by the Cauchy criterion. Let us now apply the 
ratio test. In this case. 


lim 


^n + 1 



lim 

n ->CO 



1 

= lim = 0 < 1 , 

n^oo /î H- 1 


which indicates convergence by Theorem 5. 2. 6(1). 

Nurcombe (1979) stated and proved the following extension of the ratio 
test: 

Theorem 5.2.7. Let lZ=i^n Fe a sériés of positive terms, and Æ be a fixed 
positive integer. 

1 . If lim^^S^n+k/^n^ ^ tFen the sériés converges. 

2. If lim^^S^n+k/^n^ ^ tFen the sériés diverges. 

This test reduces to the ratio test when k= 1. 

The Root or Cauchy’s Test 

This is a more powerful test than the ratio test. It is based on the following 
theorem: 



150 


INFINITE SEQUENCES AND SERIES 


Theorem 5.2.8. Let = be a sériés of positive terms. Let 
limsup^^^ «y" = p. Then we hâve the following: 

1 . The sériés converges if p < 1. 

2. The sériés diverges if p > 1. 

3. The test is inconclusive if p= 1. 

In particular, if = r exists, then we hâve the following: 

1 . The sériés converges if r< 1. 

2. The sériés diverges if r> 1. 

3. The test is inconclusive if r= 1. 

Proof 

1 . As in Theorem 5. 2.6(1), if p < 1, then there is an integer N such that 

«y” < p' for n>N, 

where p' is chosen such that p<p' < 1. Thus 

a^< p'^ for n>N. 

The sériés is therefore convergent by comparison with the convergent 
géométrie sériés = i p'", since p' < 1. 

2. Suppose that p > 1. Let c > 0 be such that e< p — 1. Then 

a]/^>p-€>l 

for infinitely many values of n (why?). Thus for such values of n, 

«« > ( p- 

which implies that cannot tend to zéro as n ^ ao and the sériés is 
therefore divergent by Resuit 5.2.1. 

3. Consider again the two sériés E“=i(l//r), E“=i(l//r^). In both cases 

p = 1 (see Exercise 5.18). The test therefore fails, since the first sériés is 
divergent and the second is convergent. □ 

Note 5.2.1. We hâve mentioned earlier that the root test is more 
powerful than the ratio test. By this we mean that whenever the ratio test 
shows convergence or divergence, then so does the root test; whenever the 
root test is inconclusive, the ratio test is inconclusive too. However, there are 
situations where the ratio test fails, but the root test doe not (see Example 
5.2.6). This fact is based on the following theorem: 



INFINITE SERIES 


151 


Theorem 5.2.9. If > 0, then 


lim inf 

n^oo 


^n + 1 



< lim inf < limsup^y” < limsup 


n 


• CO 


n 


■ 00 


n 


• CO 


^n + 1 



Proof It is sufficient to prove the two inequalities 


limsup^y" < limsup 


a 


n + l 


/î^co 




a 


n 


a 


lim inf 

n^oo a 


n + \ 


< lim inf «y 


n 


n^oo 


(5.21) 

(5.22) 


Inequality (5.21): Let q = limsup^^ li q = ^, then there is noth- 

ing to prove. Let us therefore consider that q is finite. If we choose q' such 
that q <q\ then as in the proof of Theorem 5.2. 6(1), we can find an integer 
N such that 




for n> N. 


Hence, 


a 


1/ n 


n 


< 


a 


N 


(q') 


-N 


1/n 


q 


(5.23) 


As 77 ^ 00 , the limit of the right-hand side of inequality (5.23) is q' . It follows 
that 


limsup^y” <q\ (5.24) 

n-*oo 

Since (5.24) is true for any q' >q, then we must also hâve 

limsup^y" < q. 

n^oo 

Inequality (5.22): Let /? = liminf„^ + We can consider p to be 

finite (if p = then q = ^ and the proof of the theorem will be complété; if 
P = then there is nothing to prove). Let p' be chosen such that p' <P‘ 
As in the proof of Theorem 5. 2. 6(2), we can find an integer N such that 

^^>p' îorn>N. (5.25) 

From (5.25) it is easy to show that 


«„>«A^(P') V'" 


for n>N. 



152 


INFINITE SEQUENCES AND SERIES 


Hence, for such values of n, 


a 


1/ n 
n 


> 




-N 


1/n 



Consequently, 


liminfay” >/?'• (5.26) 

n^oo 


Since (5.26) is true for any p' <p, then 


liminfay" > p. 

n-*oo 


From Theorem 5.2.9 we can easily see that whenever q <1, then 
limsup„^^ «y" < 1; whenever p> l, then limsup„^oo > 1- In both cases, 
if convergence or divergence of the sériés is resolved by the ratio test, then it 
can also be resolved by the root test. If, however, the root test fails (when 
limsup^^^ = 1), then the ratio test fails too by Theorem 5.2.6(3). On the 
other hand, it is possible for the ratio test to be inconclusive whereas the root 
test is not. This occurs when 


lim inf 

n-*oo 


a 


rt + 1 



< lim inf «y" < limsup «y- < 1 < lim sup 


a 


n + l 


n 


■ 00 


n 


• 00 


n 


■ CO 


a 


n 


□ 


Example 5.2.6. Consider the sériés + ^”), where 0 <a <b <1. 

This can be written as where for n> 1, 

^ if/îisodd, 

)^n /2 ifniseven. 


Now, 


'n + l 


n 


(b if n is odd, 
a{a/bY^^ if n iseven. 


^l/n _ 


a 


(« + !) /{In) 


b ^/2 


if n is odd, 
if n is even . 



INFINITE SERIES 


153 


As n ^ c^ + i/c^ has two limits, namely 0 and co; cY^ has two limits, 

and Thus 


liminf = 0, 

n^oo 


^n + 1 

limsup = 00 . 

r 

n-*oo 


lim sup cy " = Z? 1/2 < 1. 


n^oo 


Since 0 < 1 < we can clearly see that the ratio test is inconclusive, whereas 
the root test indicates that the sériés is convergent. 

Maclaurin’s (or Cauchy’s) Integeral Test 

This test was introduced by Colin Maclaurin (1698-1746) and then rediscov- 
ered by Cauchy. The description and proof of this test will be given in 
Chapter 6. 

Cauchy’s Condensation Test 

Let us consider the following theorem: 

Theorem 5.2.10. Let be a sériés of positive terms, where a^ is a 

monotone decreasing fonction of n (=1,2,...). Then converges or 

diverges if and only if the same is true of the sériés E“ = i 2 ”a 2 "- 

Proof. Let and t^ be the nth and mth partial sums, respectively, of 
= = If m is such that n < 2^", then 


+ (^?2 ^ 3 ) (^4 a ^ a ^ af ) 

+ ••• + (^Ï2"’“I"^2"' + 1 *** 

<^Ï^ + 2^Ï2“I"4^Ï4 + *** + 2^^Ï2”‘ “ * 


(5.27) 


Furthermore, if n > 2^”, then 




> \- a2 ^a^ -\-2. ^? 2 "'“ — • 


(5.28) 


If E“=i2"a2" diverges, then ^ 00 as m ^ 00 . Hence, from (5.28), ^ 00 as 

7 î ^ and the sériés = is also divergent. 



154 


INFINITE SEQUENCES AND SERIES 


Now, if converges, then the sequence = i bounded. From 

(5.27), the sequence is also bounded. It follows that = is a 

convergent sériés (see Exercise 5.13). □ 

Example 5.2.7. Consider again the sériés E“=i(l//r^). We hâve already 
seen that this sériés converges if Æ > 1 and diverges if Æ< 1. Let us now 
apply Cauchy’s condensation test. In this case, 

00 00 ^ 00 ^ 

l nnk ^ ^n{k—V) 

n=l n=l ^ n=l ^ 


is a géométrie sériés = where b = 1/2^“^ If Æ < 1, then b >1 and the 
sériés diverges. If Æ > 1, then b <1 and the sériés converges. It is interesting 
to note that in this example, both the ratio and the root tests fail. 

The following tests enable us to handle situations where the ratio test fails. 
These tests are particular cases on a general test called Kummer’s test. 

Kummer’s Test 

This test is named after the German mathematician Ernst Eduard Kummer 
(1810-1893). 

Theorem 5.2.11. Let = and be two sériés of positive terms. 

Suppose that the sériés is divergent. Let 


/ 1 

lim ^ 

( b^ + i 


1 ^ 


K+1 ) 



Then converges if A > 0 and diverges if A < 0. 

Proof. Suppose that A > 0. We can find an integer N such that for n> N, 

1 1 A 

>-. (5.29) 

^n + l ^n + 1 ^ 


Inequality (5.29) can also be written as 


a 


n 


a 


n + l 


K K+i I 


2 


(5.30) 



INFINITE SERIES 


155 


If is the nth partial sum of then from (5.30) and for n> N, 


that is, 


2 « + i 

+ l ^ %+l + ^ 2^ 


i=N+2 






2 ( a 


^n + l ^ %+i 


A^+l ^n + 1 


A \ K + i I 


2 a 


ly \r_i_ 1 H“ 


AT+1 


'n + 1 ^^iV+1 


A b 


for n> N. 


N+l 


(5.31) 


Inequality (5.31) indicates that the sequence {5„}^=i is bounded. Hence, the 
sériés is convergent (see Exercise 5.13). 

Now, let us suppose that A < 0. We can find an integer N such that 

1 1 

<0 for7r>A^. 

^ n + l ^n + l 


Thus for such values of n, 


(5.32) 

It is easy to verify that because of (5.32), 

^ N+ 1 

an>- K (5.33) 

^V+1 

for 7î > + 2. Since is divergent, then from (5.33) and the use of the 

comparison test we conclude that = is divergent too. □ 

Two particular cases of Kummer’s test are Raabe’s test and Gauss’s test. 


Raahe’s Test 

This test was established in 1832 by J. L. Raabe. 

Theorem 5.2.12. Suppose that is a sériés of positive terms and 

that 



a 


n + \ 


T 


1 + - +0 
n 


1 \ 

n . 


as n 


00 

» 


Then converges if r> 1 and diverges if r< 1. 



156 


INFINITE SEQUENCES AND SERIES 


Proof We hâve that 


This means that 



«« + 1 


T 


1 + - +0 
n 


1 \ 

n . ‘ 


/ 

l ^n + 1 



n 


/ 



as 7î ^ 00 . Equivalently, (5.34) can be expressed as 


lim 

n^oo 


na 


n 


— n — 1 


a 


n + l 


= r — 1. 


(5.34) 


(5.35) 


Let h^ = l/n in (5.35). This is the nth term of a divergent sériés. If we now 
apply Kummer’s test, we conclude that the sériés = converges if 
T — 1 > 0 and diverges if r — 1 < 0. □ 

Gauss’s Test 

This test is named after Cari Friedrich Gauss (1777-1855). It provides a 
slight improvement over Raabe’s test in that it usually enables us to handle 
the case r= 1. For such a value of r, Raabe’s test is inconclusive. 

Theorem 5.2.13. Let be a sériés of positive ternis. Suppose that 


a 


n 


a 


n + l 


0 

1 + - + 0 
n 


1 


Ô+l 


n 


8 > 0 . 


Then HZ=i^n converges if 1 and diverges if 1. 
Proof. Since 



l ' 

= 0 

fi] 

,.Ô+1 

f n , 

\n j 



? 


then by Raabe’s test, TZ = i^n converges if ^ > 1 and diverges if 0<1. Let us 
therefore consider 0=1. We hâve 



«n+l 


1 

1 + - + 0 
n 


1 


5+1 


n 



INFINITE SERIES 


157 


Put = l/(n log n), and consider 


lim 

n^oo 


^ 1 a 


n 


1 ^ 


\^n ^n + 1 


K + 1 I 


= lim < n log n 


n ^co 


lim 

« ^co 


1 / 1 ^ 
1 + - + 0 


n 


n 


n 


Ô+l 


— (tî + l)log(/î + 1) 


1 


(n + l)log — + (n log n)0 


Ô+l 


n 


= - 1 . 


This is true because 


lim 

n ^co 


{n + l)log 


n 


n 1 


= -1 


(by rHospitaFs rule) 


and 


lim (n log n)0 


1 


n^oo 


Ô + l 


n 


= 0 


[see Example 4.23(2)] 


Since E“ = i[l/(/îlog/î)] is a divergent sériés (this can be shown by using 
Cauchy’s condensation test), then by Kummer’s test, the sériés = is 
divergent. □ 

Example 5.2.8. Gauss established his test in order to détermine the 
convergence of the so-called hypergeometric sériés. He managed to do so in 
an article published in 1812. This sériés is of the form 1 + where 


= 


û^(a+l)(a; + 2)***(a; + /î — l))8()8+l)()8 + 2)***(/3 + /r — 1) 


n 


n\y{y-\- l)(y+2) •••(y + n — 1) 


fl — 1 , 2 ,..., 


where a,/3,y are real numbers, and none of them is zéro or a négative 
integer. We hâve 


a 


n 


(7î+l)(/î+y) 7î^+(y+l)7î+y 


a 


(7î + a)(/î + /3) /î^+(o;+/3)/î+a/3 


n + l 


y \ — a — B 11^ 
1 + + 0 


n 


n 


In this case, 0= y-\- 1 — a — 13 and 8=1. By Gauss’s test, this sériés is 
convergent if 0> 1, or y> a (3, and is divergent if ^ < 1, or y < a + /3. 



158 


INFINITE SEQUENCES AND SERIES 


5.2.2. Sériés of Positive and Négative Terms 

Consider the sériés = where may be positive or négative for n > 1. 
The convergence of this general sériés can be determined by the Cauchy 
criterion (Theorem 5.1.6). However, it is more convenient to consider the 
sériés = of absolute values, to which the tests of convergence in 
Section 5.2.1 can be applied. We recall from Définition 5.2.2 that if the latter 
sériés converges, then the sériés is absolutely convergent. This is a 

stronger type of convergence than the one given in Définition 5.2.1, since by 
Theorem 5.2.3 convergence of = implies convergence of The 

converse, however, is not necessarily true, that is, convergence of = 
does not necessarily imply convergence of = For example, consider 
the sériés 


CO 


i: 

n = \ 


(- 1 ) 


n — \ 


2n-l 


1 1 1 

= 1 — — l- — l- 

3 5 7 


(5.36) 


This sériés is convergent by the resuit of Example 5.1.6. It is not, however, 
absolutely convergent, since E^ = i[1/(2tî — 1)] is divergent by comparison 
with the harmonie sériés E“ = i(1/7î), which is divergent. We recall that a 
sériés such as (5.36) that converges, but not absolutely, is called a condition- 
ally convergent sériés. 

The sériés in (5.36) belongs to a spécial class of sériés known as alternating 
sériés. 

Définition 5.2.3. The sériés E“=i(— 1 )”“^^ï„, where > 0 for n> 1, is 
called an alternating sériés. □ 

The following theorem, which was established by Gottfried Wilhelm 
Leibniz (1646-1716), can be used to détermine convergence of alternating 
sériés: 

Theorem 5.2.14. Let E“ = i( — 1)”~^^ï„ be an alternating sériés such that 
the sequence is monotone decreasing and converges to zéro as 

^ CO. Then the sériés is convergent. 

Proof. Let be the nth partial sum of the sériés, and let m be an integer 
such that m <n. Then 


n 


E ( - 1) 


i-1 


ü: 


i = m + l 


/ X m I / 1 \ n-m-l 

= (-l) «m + l -«m+2 + ••• +(-l) a 


n 


(5.37) 



INFINITE SERIES 


159 


Since is monotone decreasing, it is easy to show that the quantity 

inside brackets in (5.37) is nonnegative. Hence, 

I I I I / 1 \ n-m-l 

=^m + l -^m+2 + *•* + (“1) 

Now, if 7î — m is odd, then 


^m + 1 


m + 2 ^m + 3 


) 


— ^m + 1 


li n—m is even, then 


^m + 1 


m + 2 ^m + 3 


) (^ ï „_2 


^«m + l- 


Thus in both cases 


Since the sequence converges to zéro, then for a given 6>0 there 

exists an integer N such that for m>N, < €. Consequently, 

“ ‘^ml < ^ if 7Î > m > A^. 

By Theorem 5.2.1, the alternating sériés is convergent. □ 

Example 5.2.9. The sériés given by formula (5.36) was shown earlier to 
be convergent. This resuit can now be easily verified with the help of 
Theorem 5.2.14. 

Example 5.2.10. The sériés E“=i( — l)”/7r^ is absolutely convergent if 
k> 1, is conditionally convergent if 0 < Æ < 1, and is divergent if Æ < 0 (since 
the Tîth term does not go to zéro). 

Example 5.2.11. The sériés conditionally con- 

vergent, since it converges by Theorem 5.2.14, but the sériés of absolute 
values diverges by Cauchy’s condensation test (Theorem 5.2.10). 


5.2.3. Rearrangement of Sériés 

One of the main différences between infinité sériés and finite sériés is that 
whereas the latter are amenable to the laws of algebra, the former are not 
necessarily so. In particular, if the order of terms of an infinité sériés is 
altered, its sum (assuming it converges) may, in general, change; or worse, the 



160 


INFINITE SEQUENCES AND SERIES 


altered sériés may even diverge. Before discussing this rather disturbing 
phenomenon, let us consider the following définition: 

Définition 5.2.4, Let dénoté the set of positive integers and 
be a given sériés. Then a second sériés such as is said to be a 

rearrangement of = if there exists a one-to-one and onto function 
/: such that for n > 1. 

For example, the sériés 

l + + ^ + + ***^ (5.38) 

where two positive terms are followed by one négative term, is a rearrange- 
ment of the alternating harmonie sériés 

l-è + l-i + i--. (5.39) 

The sériés in (5.39) is conditionally convergent, as is the sériés in (5.38). 
However, the two sériés hâve different sums (see Exercise 5.21). □ 

Fortunately, for absolutely convergent sériés we hâve the following 
theorem: 


Theorem 5.2.15. If the sériés is absolutely convergent, then any 

rearrangement of it remains absolutely convergent and has the same sum. 


Proof Suppose that = is absolutely convergent and that is a 

rearrangement of it. By Theorem 5.2.1, for a given 6>0, there exists an 
integer N such that for ail n > m > N, 


We then hâve 


i: 

i = m + l 




00 


L 

k = l 


a 


m+k 



if m > A^. 


Now, let us choose an integer M large enough so that 


{l,2,...,iV+l} c{/(l),/(2),...,/(M)}. 


It follows that if 7î > M, then f(n) > + 2. Consequently, for n > m > M, 


E \bi\ = E l«/(o 

i = m + l i=m + l 


00 



k = l 


^N+k+l 




INFINITE SERIES 


161 


This implies that the sériés = satisfies the Cauchy criterion of Theo- 
rem 5.2.1. Therefore, is absolutely convergent. 

We now show that the two sériés hâve the same sum. Let ^ = and 

be its nth partial sum. Then, for a given e > 0 there exists an integer N 
large enough so that 


%+i 



If is the nth partial sum of = then 


s ^ t„ 5'Arj.i “1“ s . 


■n ^V+1 


'V+1 


By choosing M large enough as was done earlier, and by taking n> M, 
we get 


C %+i 


n N+1 

Èt>i- E 

i=l i=l 

n A^+1 

Z! ^f(i) ~ IL 

i=l i=l 

VI il 

< 2^ \Cl^+k + i I — 9 ’ 

k=l ^ 


since if n> M, 


{«1, «2? • • • î ^N+\\ ^ {^/(l)î ^/(2)î • • • î ^f(n)} * 


Hence, for n> M, 




which shows that the sum of the sériés = is 5. 


□ 


Unlike absolutely convergent sériés, those that are conditionally conver- 
gent are susceptible to rearrangements of their terms. To demonstrate this, 
let us consider the following alternating sériés: 


CO CO 

E ««= E 

n = l n = l 





This sériés is conditionally convergent, since it is convergent by Theorem 
5.2.14 while E“=i(l/i/n) is divergent. Let us consider the following rear- 
rangement: 


E^„ = i + 


n = l 


1 

73 


1 1 1 


1 

74 


+ ••• 


(5.40) 



162 


INFINITE SEQUENCES AND SERIES 


in which two positive terms are followed by one that is négative. Let 
dénoté the sum of the first 3n terms of (5.40). Then 


^3n = 


1 + 


1 


1 ^ 


+ 


1 


+ 


^ 1 1 
+ 


1 ^ 


1 


^ V4 j 

1 ^ 


+ 


+ 


v^4tî — 3 V4/Î — 1 ^2tî ) 


1 - 


1 ] 


V2 j 

1 


+ 


^ 1 


1 ^ 


^/3 ^/4j 

1 


+ ••• + 


1 


1 ^ 


+ 


+ 


v^tT-TT 


+ ••• + 


^|2n ) 

1 1 
+ 


v^47î — 3 v^4/î — 1 


= ^2n + 


1 1 
+ 


1 


^2n + 1 ^l2n + 3 


+ ••• + 


V^47î — 1 ’ 


where ^ 2 « is the sum of the first 2n terms of the original sériés. We note that 

S3n>S2n+— r. ( 5 . 41 ) 

v4n — 1 

If 5 is the sum of the original sériés, then lim„^^ in (5.41). But, since 

n 

lim , = 

rt^oo y47î — 1 

the sequence is not convergent, which implies that the sériés in (5.40) 

is divergent. This clearly shows that a rearrangement of a conditionally 
convergent sériés can change its character. This rather unsettling characteris- 
tic of conditionally convergent sériés is depicted in the following theorem due 
to Georg Riemann (1826-1866): 


Theorem 5.2.16. A conditionally convergent sériés can always be rear- 
ranged so as to converge to any given number 5, or to diverge to +co or 

to - 00 . 


Proof The proof can be found in several books, for example, Apostol 
(1964, page 368), Fulks (1978, page 489), Knopp (1951, page 318), and Rudin 
(1964, page 67). □ 


5.2.4. Multiplication of Sériés 

Suppose that and are two sériés. We recall from Theorem 

5.2.2 that if these sériés are convergent, then their sum is a convergent sériés 



INFINITE SERIES 


163 


obtained by adding the two sériés term by term. The product of these two 
sériés, however, requires a more délicate operation. There are several ways 
to define this product. We shall consider the so-called Cauchy’s product. 

Définition 5.2.5. Let sériés in which the 

summation index starts at zéro instead of one. Cauchy’s product of these two 
sériés is the sériés where 


n 

k = 0 


that is, 


E <^n=«0^0+(^0^1+«l^o) + + +«2^o) + ***• 

n = 0 

Other Products could hâve been defined by simply adopting different ar- 
rangements of the terms that make up the sériés ^ 

The question now is: under what condition will Cauchy’s product of two 
sériés converge? The answer to this question is given in the next theorem. 

Theorem 5.2.17. Let IZ=o^n be Cauchy’s product of E“=o ^«=o K‘ 
Suppose that these two sériés are convergent and hâve sums equal to 5 and t, 
respectively. 

1. If at least one of '^Z=o^n converges absolutely, then 

E“=qC„ converges and its sum is equal to st (this resuit is known as 
Mertens’s theorem). 

2. If both sériés are absolutely convergent, then converges abso- 

lutely to the product st (this resuit is due to Cauchy). 

Proof. 

1. Suppose that is ibe sériés that converges absolutely. Let 

and dénoté the partial sums respec- 

tively. We need to show that ^ as n ^ oo. We hâve that 

M«=«0^0+ («0^1+«l^o) + ••• +(«0^« +«1^«-1 + ••• +««^o) 

= ao?„ + «!?„_! + ••• +fl„?o. (5.42) 

Let dénoté the remainder of the sériés with respect to 

that is, /3„ = t — (tî = 0, 1, 2, . . . ). By making the proper substitution in 



164 


INFINITE SEQUENCES AND SERIES 


(5.42) we get 

+«l(?-^„-l) + ••• +«„(?- /3o) 

= ?^„-(«o/3„ + «i/3„-i + +«„/3o)- (5.43) 

Since ^ 5 as n ^ the proof of (1) will be complété if we can show 
that the sum inside parenthèses in (5.43) goes to zéro as n ^ oo. We 
now proceed to show that this is the case. 

Let e > 0 be given. Since the sequence converges to zéro, 

there exists an integer N such that 

\ f3^\< € if 7î > A^. 

Hence, 

l^Qpn + ^1 Pn-1 + *** Po I 

<\anPo-\-an_i /3i+ ••• 

+ I ^Ï„-Ar_i Pn+1 + ^n-N-2 Pn+2 + '** Pn I 

n-N-1 

+ /^1+ *** +^n-v/3vl+ ^ E \^i 

i = 0 
n 

<B E l«,l + e^^ (5-44) 

i = n—N 

where 5 = max(| /3 q 1 ,|)8j , . . . , | /3^|) and is the sum of the sériés 
S^=ok„l(£^=o^« is absolutely convergent). Furthermore, because of 
this and by the Cauchy criterion we can find an integer M such that 

n 

X) \a^\<€ iin—N>M+l. 

i = n—N 

Thus when 7r>A^ + M+lwe get from inequality (5.44) 

«0 /3« + «1 /3„-i + ■■■ +a^^Q\<€{B + s*). 

Since € can be arbitrarily small, we conclude that 

lim (flo/3„ + «i/3„-i + +«„/3o) =0. 



SEQUENCES AND SERIES OF FUNCTIONS 


165 


2. Let dénoté the nth partial sum of E7=ok/l- Then 

f^« = l«o^ol + l«o^i + «i^ol+ ■■■ +\aoK + aA-i + +«„^0 

— I ^0 I I ^0 I I ^0 1 1 ^1 I I ^1 1 1 ^0 
+ *** + l^oll^nl + l^lll^n-1 I + *" + kwll^ol 

= ÛQ\t^ + 

where = Ef=o I ^ = 0, 1, 2, . . . , n. Thus, 

Vn < (kol + kj + ••• + k„l)C 

<5*^* for ail 7î, 

where is the sum of the sériés E7=ol^nl? which is convergent by 
assumption. We conclude that the sequence is bounded. Since 

> 0, then by Exercise 5.12 this sequence is convergent, and therefore 
E7=o*^rt converges absolutely. By part (1), the sum of this sériés is st. 

□ 

It should be noted that absolute convergence of at least one of E7=o^n 
and is an essential condition for the validity of part (1) of Theorem 

5.2.17. If this condition is not satisfied, then E7=o*^n converge. For 

example, consider the sériés where 

(- 1 )” 

^n = K= ! r ’ n = 0, 1, . . . . 

+ 1 

These two sériés are convergent by Theorem 5.2.14. They are not, however, 
absolutely convergent, and their Cauchy’s product is divergent (see Exercise 
5.22). 

5.3. SEQUENCES AND SERIES OF FUNCTIONS 

Ail the sequences and sériés considered thus far in this chapter had constant 
terms. We now extend our study to sequences and sériés whose terms are 
functions of x. 

Définition 5.3.1, Let {/„(x))7=i be a sequence of functions defined on a 
set D (Z R. 



166 


INFINITE SEQUENCES AND SERIES 


1. If there exists a function f(x) defined on D such that for every x in Z), 

lim/„(x) =/(x), 

« ^00 

then the sequence is said to converge to f(x) on D. Thus for 

a given e>0 there exists an integer N such that l/„(x)— /(x)| < e if 
7Î > A^. In general, N dépends on e as well as on x. 

2. If converges for every x in Z) to ^(x), then ^(x) is said to be 

the sum of the sériés. In this case, for a given 6 > 0 there exists an 
integer N such that 

5„(x) — -y(x) \ < € if > A^, 

where 5„(x) is the nth partial sum of the sériés E^ = i/„(x). The integer 
N dépends on e and, in general, on x also. 

3. In particular, if N in (I) dépends on e but not on x^D, then the 

sequence {/„(x)}“=i is said to converge uniformly to /(x) on D. 
Similarly, if N in (2) dépends on e, but not on x eZ), then the sériés 
E“ = i/„(x) converges uniformly to ^(x) on Z). □ 

The Cauchy criterion for sequences (Theorem 5.1.6) and its application to 
sériés (Theorem 5.2.1) apply to sequences and sériés of fonctions. In case of 
uniform convergence, the integer N described in this criterion dépends only 
on €. 

Theorem 5.3.1. Let {/„(x)}“=i be a sequence of fonctions defined on 
D <zR and converging to /(x). Define the number as 

K= sup |/„(x) -f{x)\. 

x&D 

Then the sequence converges uniformly to /(x) on D if and only if ^ 0 as 
^ 00 . 

Proof. Sufficiency: Suppose that A„ ^ 0 as n ^ œ. To show that /„(x) ^ 
/(x) uniformly on D. Let e > 0 be given. Then there exists an integer N such 
that for n> N, A„ < 6. Hence, for such values of n, 

\fnix) -fix)\<X„<6 

for ail X eZ). Since N dépends only on c, the sequence {/„(x)}“=i converges 
uniformly to /(x) on Z). 

Necessity: Suppose that /„(x) -^f(x) uniformly on D. To show that A„ ^ 0. 
Let 6 > 0 be given. There exists an integer N that dépends only on e such 
that for n> N, 

\fn(x) -f(x)\<- 



SEQUENCES AND SERIES OF FUNCTIONS 


167 


for ail x^D.lt follows that 


A„= sup \f„{x) -f{x)\<-. 

x^D ^ 


Thus A. ^ 0 as 




n 


00 


□ 


Theorem 5.3.1 can be applied to convergent sériés of functions by replac- 
ing /„(x) and f(x) with 5„(x) and ^(x), respectively, where ^„(x) is the nth 
partial sum of the sériés and 5(x) is its sum. 


Example 5.3.1. Let /„(x) = sin(27rx//î), 0<x<l. Then /„(x)^0 as 
^ CO. Furthermore, 


In this case, 



27TX 2t7 

< < . 

n n 



sup 

0<x<l 


/ 2t7X 

sin 

\ n 



< 


n 


Thus A„ ^ 0 as 7î ^ 00 , and the sequence converges uniformly to 

f{x) = 0 on [0, 1]. 

The next theorem provides a simple test for uniform convergence of sériés 
of functions. It is due to Karl Weierstrass (1815-1897). 

Theorem 5.3.2 (Weierstrass’s M-test). Let be a sériés of func- 
tions defined on D <z R. If there exists a sequence of constants such 

that 


|/„(x)|<M„, n = l,2„..., 

for ail x^D, and if = converges, then = converges uniformly 
on D. 

Proof Let e > 0 be given. By the Cauchy criterion (Theorem 5.2.1), there 
exists an integer N such that 


n 

E M,<e 

/ = m + 1 



168 


INFINITE SEQUENCES AND SERIES 


for ail n> m> N. Hence, for ail such values of m, n, and for ail x^D, 


L /,U) 

i = m + l 

n 

< E m,<€. 

i = m + l 


/ = m + 1 


This implies that E“ = i/„(x) converges uniformly on D by the Cauchy 
criterion. □ 


We note that Weierstrass’s M-test is easier to apply than Theorem 5.3.1, 
since it does not require specifying the sum ^(x) of the sériés. 

Example 5.3.2. Let us investigate convergence of the sequence {/„(x)}“=i , 
where /„(x) is defined as 



1 

2x H , 

n 


expi 


X ^ 
n ) 




0 <x < 1, 

1 <x < 2, 

X > 2. 


This sequence converges to 


f{x) 


2x, 0 <x < 1, 

1, X > 1. 


Now, 


\fn{x) -/(x)|= exp(x/n) -1, 

\l/n. 


0 <x < 1, 

1 <x < 2, 
X > 2. 


However, for 1 < x < 2, 


exp(x//î) — 1 < exp(2/7î) — 1. 
Furthermore, by Maclaurin’s sériés expansion, 



CO 

- 1 = E 


k = l 


{2/n) 


2 1 

> - > -. 
n n 


exp 


n 


/ 


k\ 



SEQUENCES AND SERIES OF FUNCTIONS 


169 


Thus, 


sup |/„(x) -/(x)| = exp(2/n) - 1, 

0<a:<'» 


which tends to zéro as n ^ Therefore, the sequence converges 

uniformly to f(x) on [0, 


Example 5.3.3. Consider the sériés where 




+ nx^ ’ 


0 <x < 1. 


The function /„(x) is monotone increasing with respect to x. It follows that 
for 0 <x < 1, 





But the sériés + n)] is convergent. Hence, 'LZ = ifnM is uniformly 

convergent on [0, 1] by Weierstrass’s M-test. 


5.3.1. Properties of Uniformly Convergent Sequences and Sériés 

Sequences and sériés of fonctions that are uniformly convergent hâve several 
interesting properties. We shall study some of these properties in this section. 

Theorem 5.3.3. Let be uniformly convergent to f(x) on a set 

D. If for each n, /„(x) has a limit as x ~^Xq, where Xq is a limit point of 
Z), then the sequence converges to Tq = lim^^ /(^)- This is équiva- 

lent to stating that 

lim lim/„(x) 

« -»co 1^ X—*Xq 

Proof. Let us first show that is a convergent sequence. By the 

Cauchy criterion (Theorem 5.1.6), there exists an integer N such that for a 
given 6 > 0, 

\fm{x) -fn(x)\< - for ail m>N, n>N. (5.45) 

The integer N dépends only on e, and inequality (5.45) is true for ail x^D, 
since the sequence is uniformly convergent. By taking the limit as x ^Xq in 
(5.45) we get 

€ 

^2 if ^ 




170 


INFINITE SEQUENCES AND SERIES 


which indicates that is a Cauchy sequence and is therefore conver- 

gent. Let Tq = lim„ ^ ^ r„. We now need to show that f(x) has a limit and that 
this limit is equal to Tq. Let c > 0 be given. There exists an integer N-^ such 
that for n>N„ 


\f{x) -/„(x)|< - 

for ail x^D, by the uniform convergence of the sequence. Furthermore, 
there exists an integer N 2 such that 




€ 

< - 

4 


if > A^2 • 


Thus for n > maxiN^, N 2 ), 

\f{x) -fn{x)\+\T„-To\ < - 


for ail x^D. Then 


/(^) 



^|/(^) -/«(^)| + |/«(^)-'^n 


+ 


- 


0 


< fn(x) 




(5.46) 


if n > max(A^j, A^ 2 ) for x^D. By taking the limit as x ^Xq in (5.46) we 
get 


lim f(x) 

X^Xq 



by the fact that 


lim |/„(x) - T„ 



for n = 1,2, . . . . 


(5.47) 


Since € is arbitrarily small, inequality (5.47) implies that 

lim f{x) = Tq. □ 

Corollary 5.3.1. Let be a sequence of continuons functions 

that converges uniformly to f(x) on a set D. Then f(x) is continuons on D. 

Proof The proof follows directly from Theorem 5.3.3, since =/„(xq) for 
/r> 1 and Tq = lim„^^ r„ = lim„^^ /„( xq) =/(xo). □ 



SEQUENCES AND SERIES OF FUNCTIONS 


171 


Corollary 5.3.2. Let = be a sériés of functions that converges 

uniformly to ^(x) on a set Z). If for each n, f^ix) bas a limit as x 
then the sériés converges and bas a sum equal to Sq = lim^^ -vCx), 

tbat is, 


CO 00 

lim E/«(^)= E lim/„(x). 

Proof Tbe proof is left to tbe reader. □ 

By combining Corollaries 5.3.1 and 5.3.2 we conclude tbe following corol- 
lary: 

Corollary 5.3.3. Let E“=i/„(x) be a sériés of continuons functions tbat 
converges uniformly to ^(x) on a set D. Tben ^(x) is continuons on D. 

Example5.3.4. Let /„(x) =x^/(l +x^)”“^ be defined on [1,^) for n>l. 
Let ^„(x) be tbe nth partial sum of tbe sériés E^=i/„(x). Tben, 

1 1 

1 ^ 

(l+x^) 

ï ’ 

1 y 

1 +x^ 


n 


^n(x) = E 


X 


k=l (1 +x ) 


k-1 


= x 


by using tbe fact tbat tbe sum of tbe finite géométrie sériés ^ is 


n 




k = l 


1— a 


(5.48) 


Since 1/(1 +x^) < 1 for x > 1, tben as n ^ 


^n(x) 


X 


1 - 


1 


= 1 +x 


2 


1 +x 


Tbus, 


CO 


E 


X 


2 


n 


= 1 (l+x^ 


n — 1 


= 1 +X^. 



172 


INFINITE SEQUENCES AND SERIES 


Now, let Xq = 1, then 


00 


2 


oo 


1 


1 


y lim ^ = y — T = — 

f = l (1 +X^) n = l ^ 1 “ 


= 2 


= lim (1 + x ), 

x^l 


which results from applying formula (5.48) with a = ^ and then letting 
n ^ œ, This provides a vérification to Corollary 5.3.2. Note that the sériés 
E“ = i/„(x) is uniformly convergent by Weierstrass’s M-test (why?). 


Corollaries 5.3.2 and 5.3.3 clearly show that the properties of the function 
fn(x) carry over to the sum ^(x) of the sériés E“=i/„(x) when the sériés is 
uniformly convergent. 

Another property that ^(x) shares with the /„(x)’sis given by the following 
theorem: 


Theorem 5.3.4. Let E“ = i/„(x) be a sériés of functions, where /„(x) is 
différentiable on [a, b] for n>l. Suppose that E^ = i/„(x) converges at least 
at one point XQ^[a,b] and that E“=i/;^(x) converges uniformly on [a, b]. 
Then we hâve the following: 

1 . E“ = i/„(x) converges uniformly to ^(x) on [a, b]. 

2. ^'(x) = E“=i/;^(x), that is, the dérivative of ^(x) is obtained by a 
term-by-term différentiation of the sériés E^ = i/„(x). 


Proof 


1. Let X #Xq be a point in {a, b\ By the mean value theorem (Theorem 
4.2.2), there exists a point between x and Xg such that for n > 1, 

fn{x) -fn{XQ)={x-Xa)f'n{D- (5.49) 

Since E“=i/;^(x) is uniformly convergent on [a, b\ then by the Cauchy 
criterion, there exists an integer N such that 


i: füx) 

i = m + l 


€ 

< 

b — a 


for ail n> m> N and for any x e [a, b]. From (5.49) we get 


n 


E [fi{x) -fi{Xo)] 

i = m + l 


= |x — X 


Ol 


n 


L //(^,) 

i = m + l 


b — a 
< € 


X — X 


0 



SEQUENCES AND SERIES OF FUNCTIONS 


173 


for ail n> m> N and for any x e [a, h\ This shows that 

CO 

E \fn{x) -/„(^o)] 
n = l 

is uniformly convergent on D. Consequently, 


CO 


CO 


Lfnix)= E [fnix)-fniXo)]+s(Xo) 
n = l n = l 


is uniformly convergent to ^(x) on Z), where 5 (xq) is the sum of the 
sériés £^=i/„(xo), which was assumed to be convergent. 

2. Let (/>„(/z) dénoté the ratio 


4>n{h) = 


fn{x + h) -fn{x) 



• ? 


where both x and x-\-h belong to [a, h\ By invoking the mean value 
theorem again, ^J^h) can be written as 

4>n{h) = f'n{x + e„h) , n = l,2,.... 


where 0 < < 1. Furthermore, by the uniform convergence of 

E“ = i/^(x) we can deduce that is also uniformly convergent 

on [ — r, r] for some r > 0. But 


CO CO 

E 4>n(h) = E 


n = l 


n = l 


fn{x + h) -f„{x) 

h 


s{x-\-h) — 5(x) 

h 


(5.50) 


where 5(x) is the sum of the sériés E^ = i/„(x). Let us now apply 
Corollary 5.3.2 to We get 


CO 


00 


lim E ^n{h) = E lim 




n = l 


n = \ 


/î^O 


From (5.50) and (5.51) we then hâve 


s{x-\-h) — ‘V(x) 

lim 

h 


Hfhix)- 


n = l 


(5.51) 


Thus, 


00 


^'(^) = E f'n{x). 

n = \ 


□ 



174 


INFINITE SEQUENCES AND SERIES 


5.4. POWER SERIES 


A power sériés is a spécial case of the sériés of functions discussed in Section 
5.3. It is of the form where the a„’s are constants. We hâve 

already encountered such sériés in connection with Taylor’s and Maclaurin’s 
sériés in Section 4.3. 

Obviously, just as with any sériés of functions, the convergence of a power 
sériés dépends on the values of x. By définition, if there exists a number 
P > 0 such that is convergent if |x| < p and is divergent if |x| > p, 

then p is said to be the radius of convergence of the sériés, and the interval 
( — p, p) is called the interval of convergence. The set of ail values of x for 
which the power sériés converges is called its région of convergence. 

The définition of the radius of convergence implies that is 

absolutely convergent within its interval of convergence. This is shown in the 
next theorem. 


Theorem 5.4.1. Let p be the radius of convergence of Sup- 
pose that p>0. Then converges absolutely for ail x inside the 

interval ( — p, p). 


Proof Let x be such that |x| < p. There exists a point Xq ^ ( — p, p) such 

Then, is ^ convergent sériés. By Resuit 5.2.1, 

> and hence is ^ bounded sequence by Theorem 


oi 


that |x| < |x 
a„Xo ^ 0 as 
5.1.1. Thus 


Now, 


where 




< K for ail n . 


a^x^\ = 


a 


X 


n 


n 


X 


0 } 


X 


<Kr^\ 


X 


V = 


< 1 . 


X 


0 


Since the géométrie sériés is convergent, then by the comparison test 

(see Theorem 5.2.4), the sériés is convergent. □ 


To détermine the radius of convergence we shall rely on some of the tests 
of convergence given in Section 5.2.1. 

Theorem 5.4.2. Let ^ power sériés. Suppose that 

^n + l I 


lim 

n^oo 



POWER SERIES 


175 


Then the radius of convergence of the power sériés is 



0 </? < 00 , 

P = 00, 

/? = 0 . 


Proof The proof follows from applying the ratio test given in Theorem 
5.2.6 to the sériés We hâve that if 


lim 




n + \ 


ünX 


n 



then is absolutely convergent. This inequality can be written as 

p\x\ < 1. (5.52) 

If 0 </? < 00 , then absolute convergence occurs if |x| < 1/p and the sériés 
diverges when |x| > 1/p. Thus p= 1/p. If p = the sériés diverges when- 
ever x # 0. In this case, p = 0. If p = 0, then (5.52) holds for any value of x, 
that is, P = 00 . □ 


Theorem 5.4.3. Let ^ power sériés. Suppose that 


Then, 


limsup \ a^ 

1/ n 

= q 

n^oo 




0 <^ < 00 

II 

O 

8 

II 

(^00, 

• 

O 

II 


Proof. This resuit follows from applying the root test in Theorem 5.2.8 to 
the sériés Details of the proof are similar to those given in 

Theorem 5.4.2. □ 

The détermination of the région of convergence of dépends on 

the value of p. We know that the sériés converges if |x| < p and diverges if 
|x| > p. The convergence of the sériés at x = p and x = — p has to be 
determined separately. Thus the région of convergence can be ( — p, p), 
[-p, p), (-p, p], or [-p, p]. 

Example 5.4.1. Consider the géométrie sériés either 

Theorem 5.4.2 or Theorem 5.4.3, it is easy to show that p= 1. The sériés 
diverges if x = 1 or — 1. Thus the région of convergence is ( — 1, 1). The sum 



176 


INFINITE SEQUENCES AND SERIES 


of this sériés can be obtained from formula (5.48) by letting n go to infinity. 
Thus 


CO 


/î = 0 


1 

1 — X ’ 


— 1 <x < 1. 


(5.53) 


Example 5.4.2. Consider the sériés Here, 


lim 

« ->co 





n\ 

lim — 

n^CO (/î + 1) ! 

1 

lim = 0. 

/î^co /î + 1 


Thus p= and the sériés converges absolutely for any value of x. This 
particular sériés is Maclaurin’s expansion of that is, 




“ x" 


n = Q 


ni 


Example 5.4.3. Suppose we hâve the sériés E“=i(x”//r). Then 


lim 

« ->co 


a 


rt + 1 



n 

lim = 1, 

n^co /î + 1 


and P = 1. When x = 1 we get the harmonie sériés, which is divergent. When 
X = — 1 we get the alternating harmonie sériés, which is convergent by 
Theorem 5.2.14. Thus the région of convergence is [—1, 1). 

In addition to being absolutely convergent within its interval of conver- 
gence, a power sériés is also uniformly convergent there. This is shown in the 
next theorem. 


Theorem 5.4.4. Let t)e a power sériés with a radius of conver- 

gence P ( > 0). Then we hâve the following: 

1. The sériés converges uniformly on the interval [— r, r], where r < p. 

2. If ^(x) = then ^(x) (i) is continuons on [— r, r]; (ii) is 

différentiable on [— r, r] and has dérivative 


^'(x) = Yj ^ 

n = l 





POWER SERIES 


177 


and (iii) has dérivatives of ail orders on [— r, r] and 



a^n\ 


{n —k)\ 


X 


n —k 


k =^ ? —r<Y<r 

fV _1_ ^ ^ » » » ^ / wfX' / » 


Proof 

1. If |x| <r, then k„x1 <kjr« for /î > 0. Since conver- 

gent by Theorem 5.4.1, then by the Weierstrass M-test (Theorem 5.3.2), 

is uniformly convergent on [-r, r]. 

2. (i) Continuity of ^(x) follows directly from Corollary 5.3.3. (ii) To show 
this resuit, we first note that the two sériés 

hâve the same radius of convergence. This is true by Theorem 5.4.3 and 
the fact that 


lim sup I na^ | ^/ ” = lim sup | a 


1/ n 


n 


n 


• CO 


n 


• 00 


since lim„^^ = 1 as n ^ oo. We can then assert that 
is uniformly convergent on [— r, r]. By Theorem 5.3.4, ^(x) is différen- 
tiable on [— r, r], and its dérivative is obtained by a term-by-term 
différentiation of This follows from part (ii) by repeated 

différentiation of ^(x). □ 

Under a certain condition, the interval on which the power sériés con- 
verges uniformly can include the end points of the interval of convergence. 
This is discussed in the next theorem. 

Theorem 5.4.5. Let t)e a power sériés with a finite nonzero 

radius of convergence p. If is absolutely convergent, then the 

power sériés is uniformly convergent on [ — p, p]. 

Proof. The proof is similar to that of part 1 of Theorem 5.4.4. In this case, 
for |x| <p, |a„x”| < \ajp". Since E“=ol^«lp” is convergent, then 
is uniformly convergent on [ — p, p] by the Weierstrass M-test. □ 

Example 5.4.4. Consider the géométrie sériés of Example 5.4.1. This 
sériés is uniformly convergent on [— r, r], where r<l. Furthermore, by 
differentiating the two sides of (5.53) we get 

CO ^ 

E «”■' = - — 

„=1 ( 1 --^) 


— 1 <X < 1. 



178 


INFINITE SEQUENCES AND SERIES 


This provides a sériés expansion of 1/(1 — within the interval (—1, 1). By 
repeated différentiation it is easy to show that for — 1 <x < 1, 

n -\-k- l\ „ 


The radius of convergence of this sériés is p = 1, the same as for the original 
sériés. 

Example 5.4.5. Suppose we hâve the sériés 

n / Y 

which can be written as 

CO 

„Çi 2n^ + n’ 

where z = 2x/(l—x). This is a power sériés in z. By Theorem 5.4.2, the 
radius of convergence of this sériés is p = 1. We note that when z = 1 the 
sériés E“ = i[1/(2tî^ + n)] is absolutely convergent. Thus by Theorem 5.4.5, 
the given sériés is uniformly convergent for |z| < 1, that is, for values of x 
satisfying 




1x1 

- < < -, 

2 1-x 2 


or equivalently. 




5.5. SEQUENCES AND SERIES OF MATRICES 

In Section 5.3 we considered sequences and sériés whose terms were scalar 
functions of x rather than being constant as was donc in Sections 5.1 and 5.2. 
In this section we consider yet another extension, in which the terms of the 
sériés are matrices rather than scalars. We shall provide a brief discussion of 
this extension. The interested reader can find a more detailed study of this 
topic in Gantmacher (1959), Lancaster (1969), and Graybill (1983). As in 
Chapter 2, ail matrix éléments considered here are real. 

For the purpose of our study of sequences and sériés of matrices we first 
need to introduce the norm of a matrix. 



SEQUENCES AND SERIES OF MATRICES 


179 


Définition 5.5.1, Let A be a matrix of order mXn. A norm of A, denoted 
by 11 A II, is a real-valued function of A with the following properties: 

1 . Il A|| > 0, and || A|| = 0 if and only if A = 0 . 

2. ||cA||= IdllAll, where c is a scalar. 

3. Il A + B|| < Il A|| + ||B||, where B is any matrix of order mXn. 

4 . IIACII < ||A||||C||, where C is any matrix for which the product AC is 

defined. □ 


If A = (ciij), then examples of matrix norms that satisfy properties 1, 2, 3, 
and 4 include the following: 

1 . The Euclidean norm, IIAH 2 = 

2. The spectral norm, || A|L = where c^^^CA'A) is the largest 

eigenvalue of A' A. 

Définition 5.5.2. Let A^ = (a-y^) be matrices of orders mXn for Æ> 1. 
The sequence is said to converge to the mXn matrix A = (a,y) if 

lim ^ ^ ^ = a^y for / = 1 , 2 , . . . , m; 7 = 1 , 2 , . . . , 7 î. □ 

For example, the sequence of matrices 




1 

k 


1 

ë 


- 1 


0 


k-\ 
Æ + 1 

e 

2-k^ 


k = ^ ? 

/V A. ^ ^ ^ ^ 


converges to 


A = 


0 -1 1 
2 0-1 


as Æ ^ 00 . The sequence 




1 

k 


- k ^-2 


1 


k 

\ k 


k> 1 , 2 , . . . , 


00 

» 


does not converge, since k^ — 2 goes to infinité as k 



180 


INFINITE SEQUENCES AND SERIES 


From Définition 5.5.2 it is easy to see that converges to A if and 

only if 

lim II A^ — A|| = 0, 

k-*oo 


where ||*|| is any matrix norm. 


Définition 5.5.3. Let be a sequence of matrices of order mXn. 

Then is called an infinité sériés (or just a sériés) of matrices. This 

sériés is said to converge to the m Xn matrix S = (s^j) if and only if the sériés 
Y!l=iCiijk converges for ail / = 1 , 2 , . . . , m; 7 = 1 , 2 , . . . , n, where a^jj^ is the 
(i,j)th element of A^, and 

00 

X! a-ji^ = s^j, / = 1,2, 7 = 1,2, ..., 7 î. (5.54) 

k = l 


The sériés E^=iA^ is divergent if at least one of the sériés in (5.54) is 
divergent. □ 

From Définition 5.5.3 and Resuit 5.2.1 we conclude that E^=iA^ diverges 
if lim^ a-jj^ ¥= 0 for at least one pair (/, 7 ), that is, if lim^ A^ = 5 ^ 0. 

A particular type of infinité sériés of matrices is the power sériés 
E^=o where A is a square matrix, is a scalar (Æ = 0, 1, . . . ), and A° is 
by définition the identity matrix I. For example, the power sériés 

1 1 1 

I + A+ — A^+ — A^+ ••• + — A^+ ••• 

2! 3! k\ 

represents an expansion of the exponential matrix function exp(A) (see 
Gantmacher, 1959). 

Theorem 5.5.1. Let A be an n X n matrix. Then lim^ A^ = 0 if || A|| < 1, 
where ||*|l is any matrix norm. 

Proof From property 4 in Définition 5.5.1 we can write 

||A^||<||A||^ Æ= 1 , 2 ,... . 

Since || A|| < 1, then lim^^^ll A^|| = 0, which implies that lim^^^ A^ = 0 (why?). 

□ 

Theorem 5.5.2. Let A be a symmetric matrix of order nXn such that 
|A,| <1 for / = 1, 2, . . . , 7 î, where A, is the ith eigenvalue of A (ail the 
eigenvalues of A are real by Theorem 2.3.5). Then E^=o^^ converges to 



SEQUENCES AND SERIES OF MATRICES 


181 


Proof By the spectral décomposition theorem (Theorem 2.3.10) there 
exists an orthogonal matrix P such that A = PAP', where A is a diagonal 
matrix whose diagonal éléments are the eigenvalues of A. Then 

A^ = PA^P\ Æ = 0,l,2,... . 

Since | Aj <1 for ail then A^ ^ 0 and hence A^ ^ 0 as k ^ Further- 
more, the matrix I — A is nonsingular, since 

I-A = P(I- A)P' 

and ail the diagonal éléments of I — A are positive. 

Now, for any nonnegative integer k we hâve the following identity: 

(I-A)(I + A + A2+ ••• +A^) = I-A^+^ 


Hence, 


I + A+ ••• +A^ = (I-A) \l-A*+i). 


By letting k go to infinity we get 

00 

k = 0 


since lim^^^ A^"^^ =0. □ 

Theorem 5.5.3. Let A be a symmetric nXn matrix and A be any 
eigenvalue of A. Then | A| < || A||, where || A|| is any matrix norm of A. 


Proof. We hâve that Av = Av, where v is an eigenvector of A for the 
eigenvalue A. If || A|| is any matrix norm of A, then 


I Av|| = I A| ||v|| = Il Av II < Il A 


Since v ^ 0, we conclude that 


Al < A . 


□ 


Corollary 5.5.1. Let A be a symmetric matrix of order nXn such that 
||A||< 1, where ||A|| is any matrix norm of A. Then converges to 

(I-A)-i. 


Proof This resuit follows from Theorem 5.5.2, since for /=1,2, 
Aj < ||A|| <1. □ 



» » » 



182 


INFINITE SEQUENCES AND SERIES 


5.6. APPLICATIONS IN STATISTICS 

Sequences and sériés hâve many useful applications in statistics. Some of 
these applications will be discussed in this section. 


5.6.1. Moments of a Discrète Distribution 

Perhaps one of the most visible applications of infinité sériés in statistics is in 
the study of the distribution of a discrète random variable that can assume a 
countable number of values. Under certain conditions, this distribution can 
be completely determined by its moments. By définition, the moments of a 
distribution are a set of descriptive constants that are useful for measuring its 
properties. 

Let X be a discrète random variable that takes on the values 
Xq, Xi, . . . , x„, . . . , with probabilities pin), n > 0. Then, by définition, the Æth 
central moment of X, denoted by is 


00 




= e[(X-/x)^]= y. pŸp{n), Æ=l,2,..., 


n = 0 


where p = E{X) = IT^^^QX^pin) is the mean of X. We note that /^2 = is 
the variance of X. The Æth noncentral moment of X is given by the sériés 


n = 0 


(5.55) 


We note that = p. If, for some integer N, |x„| >1 for n>N, and if the 
sériés in (5.55) converges absolutely, then so does the sériés for /Xy (j = 
1, 2, . . . , Æ — 1). This follows from applying the comparison test: 


|x„IV(n) < |x„lV(«) 


if j <k and n> N. 


Examples of discrète random variables with a countable number of values 
include the Poisson (see Section 4.5.3) and the négative binomial. The latter 
random variable represents the number n of f allures before the rth success 
when independent trials are performed, each of which has two probability 
outcomes, success or failure, with a constant probability p of success on each 
trial. Its probability mass function is therefore of the form 


n = 0, 1,2, 




» » » 



APPLICATIONS IN STATISTICS 


183 


By contrast, the Poisson random variable has the probability mass function 


— A\n 


p{n) = 




ni 


! ’ 


n — 0, 1, 2, , 


where A is the mean of X. We can verify that A is the mean by writing 

CO 

P-= L n — 


n = Q 


CO 


= Xe ^ Y. 


A 


n — l 


n 


= 1 («-!)! 


CO \n 


/î = 0 


ni 


= Xe by Maclaurin’s expansion of e 

= A. 


The second noncentral moment of the Poisson distribution is 


/^2= L 


CO g-A^n 

2 


n 


n = 0 


ni 


“ A 
e ^X^ n 


n — l 


«=1 («-!)! 


CO 




= e"^A 


CO \n — 2 

XL 

n 


00 \ « — 1 

+ 1: 


=2 («-2)! „ = i(n-l)! 


= e ^X[Xe^ e^] 

= A^ + A. 

In general, the Æth noncentral moment of the Poisson distribution is given by 
the sériés 


L 


00 - — A\ « 


n 


n = 0 


ni 


1 ’ 


X ^ ^ ^ » » » ^ 


» » » 



184 


INFINITE SEQUENCES AND SERIES 


which converges for any Æ. This can be shown, for example, by the ratio test 


lim = lim 


n 


^ n + 1 A 


n 


n \ 


= 0 < 1 . 

Thus ail the noncentral moments of the Poisson distribution exist. 
Similarly, for the négative binomial distribution we hâve 


CO 


/j-= E 


n 


n r — 1 


« =0 


n 


p\^-p) 


n 


''(1 -p) 
P 


(why?), 


(5.56) 


CO 


/J- 2= E 


n 


« =0 


n r — 1 
n 


p\^-p) 


n 


r(l —p)(l + r — rp) 


P 


2 


(why?) 


(5.57) 


and the kth noncentral moment, 


00 


p ^' k = E « 

n = 0 


k^n+r-l 
n 


p\l-pŸ, Æ=l,2,..., (5.58) 


exists for any Æ, since, by the ratio test. 


a 


lim 

n->co a 


n + \ 


n 


lim 

n^oo 


In 


k 

f /r + r ' 
[n + l_ 

l n ) 


f 7î + r ■ 

l n 

-C 

) 



-p) 


= (1 -p) lim 


n + 1 


^ k 


n ->co \ H 


n r 
n 1 1 


= l-p<l, 


which proves convergence of the sériés in (5.58). 

A very important inequality that concerns the mean p. and variance o- ^ of 
any random variable X (not just the discrète ones) is Chebyshev’s inequality. 



APPLICATIONS IN STATISTICS 


185 


namely, 

1 

P{\X- fj\ >ra) < — , 


or equivalently, 

1 

P{\X-ii\ <ra)>l--^, (5.59) 

where r is any positive number (see, for example, Lindgren, 1976, Section 
2.3.2). The importance of this inequality stems from the fact that it is 
independent of the exact distribution of X and connects the variance of X 
with the distribution of its values. For example, inequality (5.59) States that at 
least (1 — 1/r^) X 100% of the values of X fall within ra from its mean, 
where a = is the standard déviation of X. 

Chebyshev’s inequality is a spécial case of a more general inequality called 
Markov’s inequality. If 6 is a nonzero constant and h{x) is a nonnegative 
function, then 

P[h{X)>b^]<y^E[h{X)], 

provided that E[h{X)] exists. Chebyshev’s inequality follows from Markov’s 
inequality by choosing h{X) = {X— /x)^. 

Another important resuit that concerns the moments of a distribution is 
given by the following theorem, regarding what is known as the Stieltjes 
moment problem, which also applies to any random variable: 

Theorem 5.6.1. Suppose that the moments fj!j^ (Æ = 1, 2, . . . ) of a random 
variable X exist, and the sériés 


CO 



k = l 



(5.60) 


is absolutely convergent for some r > 0. Then these moments uniquely 
détermine the cumulative distribution function F(x) of X. 


Proof See, for example, Fisz (1963, Theorem 3.2.1). □ 

In particular, if 

k=l,2,..., 

for some constant M, then the sériés in (5.60) converges absolutely for any 
T> 0 by the comparison test. This is true because the sériés E^=i(M^/Æ!)r^ 
converges (for example, by the ratio test) for any value of r. 



186 


INFINITE SEQUENCES AND SERIES 


It should be noted that absolute convergence of the sériés in (5.60) is a 
sufficient condition for the unique détermination of F(xX but is not a 
necessary condition. This is shown in Rao (1973, page 106). Furthermore, 
if some moments of X fail to exist, then the remaining moments that do exist 
cannot détermine F{x) uniquely. The following counterexample is given in 
Fisz (1963, page 74): 

Let X be a discrète random variable that takes on the values = 2"//r^, 
n>l, with probabilities p{n) = 1/2”, n>l. Then 

fi = E{X)= E X, 

n = l ^ 

which exists, because the sériés is convergent. However, /x '2 does not exist, 
because 

00 2^n 

=£(X^)= E - 

n = l ^ 

and this sériés is divergent, since 2”/^"^ ^ 00 as n ^ 

Now, let Y be another discrète random variable that takes on the value 
zéro with probability | and the values y„ = n>l, with probabilities 

q(n) = 1/2”'^/ 7î > 1. Then, 

E(Y)= E \=E(X). 

n = l ^ 

The second noncentral moment of Y does not exist, since 

CO 2” ^ 

m'2 = £(Y^)= E —, 

n = l ^ 

and this sériés is divergent. 

Since ix '2 does not exist for both X and Y, none of their noncentral 
moments of order k>2 exist either, as can be seen from applying the 
comparison test. Thus X and Y hâve the same first noncentral moments, but 
do not hâve noncentral moments of any order greater than 1. These two 
random variables hâve obviously different distributions. 

5.6.2. Moment and Probability Generating Functions 

Let X be a discrète random variable that takes on the values Xq, x^, X 2 , . . . 
with probabilities p{n), n>0. 

The Moment Generating Function ofX 
This function is defined as 

00 

(j){t) =E{e‘^) = Y, e‘^'‘p{n) 

n = 0 


(5.61) 



APPLICATIONS IN STATISTICS 


187 


provided that the sériés converges. In particular, if = n for n > 0, then 


HO = L e‘"p{n), 

n = 0 


(5.62) 


which is a power sériés in e\li p is the radius of convergence for this sériés, 
then by Theorem 5.4.4, </>(0 is a continuons function of t and has dérivatives 
of ail orders inside its interval of convergence. Since 


d%{t) 

dt^ 


/=o 


= E{X0=p:k, k=i,2,..., 


(5.63) 


c/)(0, when it exists, can be used to obtain ail noncentral moments of X, 
which can completely déterminé the distribution of X by Theorem 5.6.1. 

From (5.63), by using Maclaurin’s expansion of </>(0, we can obtain an 
expression for this function as a power sériés in t: 



(5.64) 


Let us now go back to the sériés in (5.62). If 


lim 

n^oo 


P(n+ 1 ) 

p{n) 



then by Theorem 5.4.2, the radius of convergence p is 

(l/p, 0<p<^, 

p=lo, p = °°, 

[œ, p = 0. 

Alternatively, if limsup„^ =q, then 


P = 


1/q, 0<<?<oo, 

0, <? = oo, 



188 


INFINITE SEQUENCES AND SERIES 


For example, for the Poisson distribution, where 


— X\n 


p{n) = 


e-^k 


n\ 


1 ’ 


Az = 0, 1, 2, . . . , 


we hâve lim„^ J p{_n + l)/p{n)] = lim„^ J A/(7z + 1)] = 0. Hence, p = ^, that 
is. 


CO 


— X\n 


m= E 


n = 0 


e-^K 


ni 


Jn 


converges uniformly for any value of t for which < ao, that is, 

As a matter of fact, a closed-form expression for c^(0 can be found, since 


CO 


4>(t) =e ^ E 


(Xe‘) 


n 


n = 0 


ni 


= c“^exp(Ac^) 

= exp( ke^ — A) for ail t. 

The Ath noncentral moment of X is then given by 


(5.65) 




d%{t) 

dt^ 


t = Q 


d^{ke^ — A) 
dt^ 


t = 0 


In particular, the first two noncentral moments are 

= /X = A, 
p '2 = A + A^. 

This confirms our earlier finding concerning these two moments. 

It should be noted that formula (5.63) is valid provided that there exists a 
ô>0 such that the neighborhood Ng{0) is contained inside the interval of 
convergence. For example, let X hâve the probability mass function 


p{n) = 


2 2 ’ 

TT n 


n — 1 , 2 ,... . 


Then 


p{n + 1) 

lim ^ ^ — = 1. 


n^oc 


p{n) 


Hence, by Theorem 5.4.4, the sériés 


CO 


m= E 


.tn 


n = l 


2 2 

TT n 



APPLICATIONS IN STATISTICS 


189 


converges uniformly for values of t satisfying <r, where r < 1, or equiva- 
lently, for t < log r < 0. If, however, ^ > 0, then the sériés diverges. Thus 
there does not exist a neighborhood A^g(O) that is contained inside the 
interval of convergence for any ô > 0. Consequently, formula (5.63) does not 
hold in this case. 

From the moment generating function we can dérivé a sériés of constants 
that play a rôle similar to that of the moments. These constants are called 
cumulants. They hâve properties that are, in certain circumstances, more 
useful than those of the moments. Cumulants were originally defined and 
studied by Thiele (1903). 

By définition, the cumulants of X, denoted by ^ 2 , . . . , /c„, . . . are 
constants that satisfy the following identity in t: 


exp K^t + 


^2^ 

~2\ 


+ ••• + + 

n\ 


. , ^2 2 „ 

= 1 + uLaÎ h L + **• H 1” + • 

2! n\ 


(5.66) 


Using formula (5.64), this identity can be written as 


00 


K. 


L -^?” = iog(/)(0, 

n\ 


(5.67) 


n = l 


provided that c^(0 exists and is positive. By définition, the natural logarithm 
of the moment generating function of X is called the cumulant generating 
function. 

Formula (5.66) can be used to express the noncentral moments in terms of 
the cumulants, and vice versa. Kendall and Stuart (1977, Section 3.14) give a 
general relationship that can be used for this purpose. For example, 

^1 = 

r /2 

f<2 = 

i<3 = - 3/r'i /r '2 + 2/j.']^ . 


The cumulants hâve an interesting property in that they are, except for 
invariant to any constant shift c in X. That is, for n = 2, 3, . . . , k„ is not 
changed if X is replaced by X + c. This follows from noting that 


which is the moment generating function of X + c. But 

log[e"*(()(0] = et + log . 



190 


INFINITE SEQUENCES AND SERIES 


By comparison with (5.67) we can then conclude that except for k^, the 
cumulants of X + c are the same as those of X. This contrasts sharply with 
the noncentral moments of X, which are not invariant to such a shift. 

Another advantage of using cumulants is that they can be employed to 
obtain approximate expressions for the percentile points of the distribution 
of X (see Section 9.5.1). 


Example 5.6.1. Let X be a Poisson random variable whose moment 
generating function is given by formula (5.65). By applying (5.67) we get 

“ K 

E ^?" = log[exp(Ae'-À)] 

n = l 

= ke^ — A 


Here, we hâve made use of Maclaurin’s expansion of e\ This sériés converges 
for any value of L It follows that = A for n = 1, 2, . . . . 

The Probability Generating Function 

This is similar to the moment generating function. It is defined as 


CO 


>P{t) =E{t^) = E E-pin). 

n = 0 


In particular, ii x^=n for n > 0, then 


</'(0 = E 

n = 0 


(5.68) 


which is a power sériés in t. Within its interval of convergence, this sériés 
represents a continuons function with dérivatives of ail orders. We note that 
i//(0) = p(0) and that 

1 

lô. dt’^ 

Thus, the entire probability distribution of X is completely determined by 
ijj(t). 

The probability generating function is also useful in determining the 


= p(k), Æ=l,2,... . (5.69) 

t = Q 



APPLICATIONS IN STATISTICS 


191 


moments of X. This is accomplished by using the relation 

CO 

= ^ n{n — 1) {n — k \)p{n) 

n = k 

= E[X{X- 1) •••(X-Æ+ 1)]. (5.70) 

The quantity on the right-hand side of (5.70) is called the kth factorial 
moment of X, which we dénoté by The noncentral moments of X can be 
derived from the ^^’s. For example, 

fil = Oi, 

/X3 = ^3 + 3^2 ^1? 

/^4 “ + 6^3 + 7O2 + ^1* 

Obviously, formula (5.70) is valid provided that t = 1 belongs to the interval 
of convergence of the sériés in (5.68). 

If a closed-form expression is available for the moment generating func- 
tion, then a corresponding expression can be obtained for by replacing 
with L For example, from formula (5.65), the probability generating 
function for the Poisson distribution is given by = exp(A^ — A). 


dt^ 


5.6.3. Some Limit Theorems 

In Section 3.7 we defined convergence in probability of a sequence of 
random variables. In Section 4.5.1 convergence in distribution of the same 
sequence was introduced. In this section we introduce yet another type of 
convergence. 

Définition 5.6.1. A sequence {Xj^^i of random variables converges in 
quadratic mean to a random variable X if 

lim E{X„-Xf = 0. 

n^oo 

q.m. 

This convergence is written symbolically as X„ > X. □ 

Convergence in quadratic mean implies convergence in probability. This 

q.m. 

follows directly from applying Markov’s inequality: If X„ > X, then for any 

6 > 0 , 

P(|X„ -XI > e) < -XŸ - 0 

as 7î ^ 00 . This shows that the sequence {Xj^^i converges in probability 
to X. 



192 


INFINITE SEQUENCES AND SERIES 


5.6.3.I. The Weak Law of Large Numbers (Khinchine’s Theorem) 

Let be a sequence of independent and identically distributed random 

variables with a finite mean /x. Then converges in probability to /x as 
n ^ where = (1/tî)E”=iX^ is the sample mean of a sample of size n. 

Proof See, for example, Lindgren (1976, Section 2.5.1) or Rao (1973, 
Section 2c.3). □ 

Définition 5.6.2. A sequence of random variables converges 

strongly, or almost surely, to a random variable X, written symbolically as 
X^ X, if for any e > 0, 

lim p( sup |X„ —X\ > e] = 0. □ 

^n>N ^ 

Theorem 5.6.2. Let be a sequence of random variables. Then we 

hâve the following: 

â S 

1. If X„ — ^ c, where c is constant, then X„ converges in probability to c. 

2. If X„ - — >c, and the sériés E“=i-E'(X„ — c)^ converges, then X„ -^c. 

5.Ô.3.2. The Strong Law of Large Numbers (Kolmogorov^s Theorem) 

Let be a sequence of independent random variables such that 

E(X^) = /x„ and Var(X„) = o-/, n = 1, 2, . . . . If the sériés (r^/n^ con- 
verges, then X„ /x„, where /x^ = (1/7 î)E"=i Pî‘ 

Proof See Rao (1973, Section 2c.3). □ 

S.6.3.3. The Continuity Theorem for Probability Generating Functions 
See Feller (1968, page 280). 

Suppose that for every Æ > 1, the sequence [pk^n)Xi=Q represents a dis- 
crète probability distribution. Let corresponding 

probability generating function (Æ = 1, 2, . . . ). In order for a limit 


<?«= 

k^oo 


to exist for every n = 0, 1, . . . , it is necessary and sufficient that the limit 



APPLICATIONS IN STATISTICS 


193 


exist for every t in the open interval (0, 1). In this case, 

CO 

*A(0 = L 

= 0 

This theorem implies that a sequence of discrète probability distributions 
converges if and only if the corresponding probability generating functions 
converge. It is important here to point out that the ^„’s may not form a 
discrète probability distribution (because they may not sum to 1). The 
function \p{t) may not therefore be a probability generating function. 


5.6.4. Power Sériés and Logarithmic Sériés Distributions 

The power sériés distribution, which was introduced by Kosambi (1949), 
represents a family of discrète distributions, such as the binomial. Poisson, 
and négative binomial. Its probability mass function is given by 

^ 77^’ 7Î = 0,1,2,..., 

Jy^) 

where > 0, 0 > 0, and f{0) is the function 

CO 

f{0)=T.aJ\ (5.71) 

rt = 0 

This function is defined provided that 0 falls inside the interval of conver- 
gence of the sériés in (5.71). 

For example, for the Poisson distribution, ^ = A, where A is the mean, 
a^ = l/n\ for n = 0, 1,2, . . . , and f{9) = e^. For the négative binomial, 9 = 

1 — /? and = 7Î = 0, 1, 2, . . . , where n = number of failures, 

r = number of successes, and p = probability of success on each trial, and 
thus 


00 


m= i: 

n = 0 


n r — 1 
n 




1 

7 


A spécial case of the power sériés distribution is the logarithmic sériés 
distribution. It was first introduced by Fisher, Corbet, and Williams (1943) 
while studying abundance and diversity for insect trap data. The probability 
mass function for this distribution is 


p{n) 


0" 

n\og{l — 9)' 


where 0 < ^ < 1. 



194 


INFINITE SEQUENCES AND SERIES 


The logarithmic sériés distribution is useful in the analysis of varions kinds 
of data. A description of some of its applications can be found, for example, 
in Johnson and Kotz (1969, Chapter 7). 


5.6.5. Poisson Approximation to Power Sériés Distributions 

See Pérez-Abreu (1991). 

The Poisson distribution can provide an approximation to the distribution 
of the sum of random variables having power sériés distributions. This is 
based on the following theorem: 

Theorem 5.6.3. For each Æ > 1, let X 2 , . . . , be independent non- 
negative integer-valued random variables with a common power sériés distri- 
bution 


Pk{n)=aJ^/f{e,,), n = 0,l,..., 
where > 0 (n = 0, 1, . . . ) are independent of k and 

00 

f{0k)= LaJ^k, 0,>0. 

n = 0 

Let ^0 > 0, A > 0 be fixed and 5^ = If kdj^ ^ A as then 

lim P(S/^=n) = c“'^oAq//î!, tî = 0, 1, . . . , 

k^oo 

where Xq = Xü^/üq. 

Proof See Pérez-Abreu (1991, page 43). □ 

By using this theorem we can obtain the well-known Poisson approxima- 
tion to the binomial and the négative binomial distributions as shown below. 

Example 5.6.2 (The Binomial Distribution). For each Æ > 1, let 
X^,...,X^ be a sequence of independent Bernoulli random variables with 
success probability Let Suppose that kpj^ ^ A > 0 as Æ ^ co. 

Then, for each n = 0, 1, . . . , 


lim P{Sj^ =n) = lim 

/:^co ^^00 



k — n 


= e 'A"//î!. 


This follows from the fact that the Bernoulli distribution with success 
probability pj^ is a power sériés distribution with 0j^ = pj^/il —Pk) and fiO^^) = 



APPLICATIONS IN STATISTICS 


195 


1 + Since aQ = a-^ = l, and \ as we get from applying 

Theorem 5.6.3 that 

lim P{S^ =n) = /n\, 

Â:->co 

Example 5.6.3 (The Négative Binomial Distribution). We recall that a 
random variable Y has the négative binomial distribution if it represents the 
number of failures n (in repeated trials) before the kth success (k> 1). Let 
Pi^ dénoté the probability of success on a single trial. Let X 2 , . . . , be 
random variables defined as 

= number of failures occurring before the Ist success, 

X 2 = number of failures occurring between the Ist success 
and the 2nd success, 


Xf^ = number of failures occurring between the (k — l)st 
success and the kth success. 


Such random variables hâve what is known as the géométrie distribution. It is 
a spécial case of the négative binomial distribution with k= 1. The common 
probability distribution of the X/s is 


P{Xi = n) =Pk{l -PkŸ, 


n = 0, 1 . . . ; 




This is a power sériés distribution with = 1 (n = 0, 1, . . . ), 0^ = 1 — and 


f{^k)= L {^-PkY= 1 n - 
«=0 ^ 


Pk) 


1 

Pk ‘ 


It is easy to see that X^, X 2 , . . . , X^;, are independent and that Y=S^ = 

Let us now assume that k{l — /?^) ^ A > 0 as k^ Then from Theorem 
5.6.3 we obtain the following resuit: 

lim P(Sf^ = n) = /n\, n = 0, 1 

k^oo 


5.6.6. A Ridge Régression Application 
Consider the linear model 


y = xp + e. 



196 


INFINITE SEQUENCES AND SERIES 


where y is a vector of n response values, X is an n X/? matrix of rank /?, p is 
a vector of p unknown parameters, and e is a random error vector such that 
£"(€) = 0 and Var(e) = Ail variables in this model are corrected for 
their means and scaled to unit length, so that X'X and X'y are in corrélation 
form. 

We recall from Section 2.4.2 that if the columns of X are multicollinear, 
then the least-squares estimator of P, namely P = (X'X)“^X'y, is an unreli- 
able estimator due to large variances associated with its éléments. There are 
several methods that can be used to combat multicollinearity. A review of 
such methods can be found in Ofir and Khuri (1986). Ridge régression is one 
of the most popular of these methods. This method, which was developed by 
Hoerl and Kennard (1970a, b), is based on adding a positive constant k to the 
diagonal éléments of X'X. This leads to a biased estimator p* of P called the 
ridge régression estimator and is given by 

p* = (X'X + ÆI„)-'x'y. 

The éléments of p* can hâve substantially smaller variances than the corre- 
sponding éléments of p (see, for example, Montgomery and Peck, 1982, 
Section 8.5.3). 

Draper and Herzberg (1987) showed that the ridge régression residual 
sum of squares can be represented as a power sériés in k. More specifically, 
consider the vector of predicted responses, 

ÿ, = xp* 

= X(X'X + ÆI„)“A'y, (5.72) 

which is based on using p*. Formula (5.72) can be written as 

y, = X(X'X) [l„ + Æ(X'X) ■'] ''x'y. (5.73) 

From Theorem 5.5.2, if ail the eigenvalues of Æ(X'X)“^ are less than one in 
absolute value, then 

I„+Æ(X'X)"^] = E (-1)'^‘(X'X)"‘. (5.74) 

i = 0 

From (5.73) and (5.74) we get 

y, = (Hi - ÆH 2 + Pu, - Pu, + - )y, 

where = X(X'X)“'X', i>l. Thus the ridge régression residual sum of 
squares, which is the sum of squares of déviations of the éléments of y from 
the corresponding éléments of , is 

(y-y,t)'(y-yi) =y'Qy, 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


197 


where 


Q = (I„ - Hj + ÆH 2 - + Æ^H4 )^ 

It can be shown (see Exercise 5.32) that 

00 

y'Qy = 55^+ I. {i-2){-ky~^ S„ (5.75) 

/ = 3 


where SS^ is the usual least-squares residual sum of squares, which can be 
obtained when Æ = 0, that is, 




I„-X(X'X) 



and 5; = y'H,y, i > 3. The terms to the right of (5.75), other than 55^, are 
bias sums of squares induced by the presence of a nonzero k. Draper and 
Herzberg (1987) demonstrated by means of an example that the sériés in 
(5.75) may diverge or else converge very slowly, depending on the value of k. 


FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

Apostol, T. M. (1964). Mathematical Analysis. Addison-Wesley, Reading, Mas- 
sachusetts. (Infinité sériés are discussed in Chap. 12.) 

Boyer, C. B. (1968). A History of Mathematics. Wiley, New York. 

Draper, N. R., and A. M. Herzberg (1987). “A ridge-regression sidelight.” Amer. 
Statist., 41, 282-283. 

Draper, N. R., and H. Smith (1981). Applied Régression Analysis, 2nd ed. Wiley, New 
York. (Chap. 6 discusses ridge régression in addition to the varions statistical 
procedures for selecting variables in a régression model.) 

Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. I, 3rd 
ed. Wiley, New York. 

Fisher, R. A., and E. A. Cornish (1960). “The percentile points of distribution having 
known cumulants.” Technometrics, 2, 209-225. 

Fisher, R. A., A. S. Corbet, and C. B. Williams (1943). “The relation between the 
number of species and the number of individuals in a random sample of an 
animal population.” J. Anim. Ecology, 12, 42-58. 

Fisz, M. (1963). Probability Theory and Mathematical Statistics, 3rd ed. Wiley, New 
York. (Chap. 5 deals almost exclusively with limit distributions for sums of 
independent random variables.) 

Fulks, W. (1978). Advanced Calculas, 3rd ed. Wiley, New York. (Chap. 2 discusses 
limits of sequences; Chap. 13 deals with infinité sériés of constant terms; Chap. 
14 deals with sequences and sériés of functions; Chap. 15 provides a study of 
power sériés.) 

Gantmacher, F. R. (1959). The Theory of Matrices, Vol. I. Chelsea, New York. 



198 


INFINITE SEQUENCES AND SERIES 


Graybill, F. A. (1983). Matrices with Applications in Statistics, 2nd ed. Wadsworth, 
Belmont, California. (Chap. 5 indudes a section on sequences and sériés of 
matrices.) 

Hirschman, I. I., Jr. (1962). Infinité Sériés. Holt, Rinehart and Winston, New York. 
(This book is designed to be used in applied courses beyond the advanced 
calculus level. It emphasizes applications of the theory of infinité sériés.) 

Hoerl, A. E., and R. W. Kennard (1970a). “Ridge régression: Biased estimation for 
non-orthogonal problems.” Technometrics, 12 , 55-67. 

Hoerl, A. E., and R. W. Kennard (1970b). “Ridge régression: Applications to 
non-orthogonal problems.” Technometrics, 12 , 69-82; Correction. 12 , 723. 

Hogg, R. V., and A. T. Craig (1965). Introduction to Mathematical Statistics, 2nd ed. 
Macmillan, New York. 

Hyslop, J. M. (1954). Infinité Sériés, 5th ed. Oliver and Boyd, Edinburgh. (This book 
présents a concise treatment of the theory of infinité sériés. It provides the basic 
éléments of this theory in a clear and easy-to-follow manner.) 

Johnson, N. L., and S. Kotz (1969). Discrète Distributions. Houghton Mifflin, Boston. 
(Chaps. 1 and 2 contain discussions concerning moments, cumulants, generating 
functions, and power sériés distributions.) 

Kendall, M., and A. Stuart (1977). The Advanced Theory of Statistics, Vol. 1, 4th ed. 
Macmillan, New York. (Moments, cumulants, and moment generating functions 
are discussed in Chap. 3.) 

Knopp, K. (1951). Theory and Application of Infinité Sériés. Blackie and Son, London. 
(This reference book provides a detailed and comprehensive study of the theory 
of infinité sériés. It contains many interesting examples.) 

Kosambi, D. D. (1949). “Characteristic properties of sériés distributions.” Proc. Nat. 
Inst. Sci. India, 15 , 109-113. 

Lancaster, P. (1969). Theory of Matrices. Academie Press, New York. (Chap. 5 
discusses functions of matrices in addition to sequences and sériés involving 
matrix terms.) 

Lindgren, B. W. (1976). Statistical Theory, 3rd ed. Macmillan, New York. (Chap. 2 
contains a section on moments of a distribution and a proof of Markov’s 
inequality.) 

Montgomery, D. C., and E. A. Peck (1982). Introduction to Linear Régression Analysis. 
Wiley, New York. (Chap. 8 discusses the effect of multicollinearity and the 
methods for dealing with it including ridge régression.) 

Nurcombe, J. R. (1979). “A sequence of convergence tests.” Amer. Math. Monthly, 86 , 
679-681. 

Ofir, C., and A. I. Khuri (1986). “Multicollinearity in marketing models: Diagnostics 
and remédiai measures.” Internat. J. Res. Market., 3, 181-205. (This is a review 
article that surveys the problem of multicollinearity in linear models and the 
varions remédiai measures for dealing with it.) 

Pérez-Abreu, V. (1991). “Poisson approximation to power sériés distributions.” Hmer. 
Statist., 45 , 42-45. 

Pye, W. C., and P. G. Webster (1989). “A note on Raabe’s test extended.” Math. 
Comput. Ed., 23 , 125-128. 



EXERCISES 


199 


Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd ed. Wiley, New 
York. (Chap. 2 contains a section on limit theorems in statistics, including the 
weak and strong laws of large numbers.) 

Rudin, W. (1964). Principles of Mathematical Analysis, 2nd ed. McGraw-Hill, New 
York. (Sequences and sériés of scalar constants are discussed in Chap. 3; 
sequences and sériés of functions are studied in Chap. 7.) 

Thiele, T. N. (1903). Theory of Observations. Layton, London. Reprinted in Ann. 
Math. Statist., 2, 165-307 (1931). 

Wilks, S. S. (1962). Mathematical Statistics. Wiley, New York. (Chap. 4 discusses 
different types of convergence of random variables; Chap. 5 présents several 
results concerning the moments of a distribution.) 

Withers, C. S. (1984). “Asymptotic expansions for distributions and quantités with 
power sériés cumulants.”/. Roy. Statist. Soc. Ser. B, 46, 389-396. 


EXERCISES 


In Mathematics 


5.1. Suppose that is a bounded sequence of positive terms. 

(a) Define b^ = max{a^,a 2 ,^>^,aj, tî = 1,2, .... Show that the se- 
quence converges, and identify its limit. 

(b) Suppose further that c as n ^ oo, where c > 0. Show that 

where is the sequence of géométrie means, = 

(nr=i«, ■)'/". 

5.2. Suppose that and are any two Cauchy sequences. Let 

dn = \cin~bn\^ /î = l, 2, . . . . Show that the sequence converges. 

5.3. Prove Theorem 5.1.3. 


5.4. Show that if is a bounded sequence, then the set E of ail its 

subsequential limits is also bounded. 


5.5. Suppose that ^ c as n ^ oo and that is a sequence of positive 

terms for which u, ^ oo as n ^ co. 

(a) Show that 




as n 


00 


In particular, if a- = 1 for ail i, then 

1 « 

~ IL as ^ 00 . 


(b) Show that the converse of the spécial case in (a) does not always 



200 


INFINITE SEQUENCES AND SERIES 


hold by giving a counterexample of a sequence that does 

not converge, yet converges as n ^ oo. 

5 . 6 . Let [cin]n = i be a sequence of positive terms such that 

> b as n ^ 

where 0 <b <1. Show that there exist constants c and r such that 
0 < r < 1 and c > 0 for which <cr^ for sufficiently large values of n. 

5.7. Suppose that we hâve the sequence where a^ = l and 

aj3b + al) 

a„ + x= 3 ^ 2 ^^ — ’ b>Q, n = l,2,.... 

Show that the sequence converges, and find its limit. 

5 . 8 . Show that the sequence converges, and find its limit, where 

a^ = 1 and 

fl„ + i = + n = l,2,.... 

5 . 9 . Let be a sequence and s„ = 

(a) Show that limsup^^ < limsup„^^a„. 

(b) If s^/n converges as n ^ oo, then show that a^/n ^ 0 as n ^ co. 

5 . 10 . Show that the sequence where = E”=i(l/0, is not a Cauchy 

sequence and is therefore divergent. 

5 . 11 . Suppose that the sequence satisfies the following condition: 

There is an r, 0 < r < 1, such that 

^n + i ~^n\ <br^, n = 1 , 2 ,..., 

where è is a positive constant. Show that this sequence converges. 


5 . 12 . Show that if > 0 for ail n, then converges if and only if 

is ^ bounded sequence, where is the nth partial sum of the 

sériés. 

5 . 13 . Show that the sériés E“=i[l/(3/r — l){3n + 2)] converges to 

5 . 14 . Show that the sériés — 1 )^ is divergent for /? < 1 . 

5 . 15 . Let = be a divergent sériés of positive terms. 

(a) If {aXi = i is a bounded sequence, then show that + aj\ 

diverges. 



EXERCISES 


201 


(b) Show that (a) is true even if is not a bounded sequence. 

5 . 16 . Let TTn = i^n ^ divergent sériés of positive terms. Show that 

-^< , 7î = 2,3,..., 

^«-1 

where is the nth partial sum of the sériés; then deduce that 

converges. 


5 . 17 . Let be a convergent sériés of positive terms. Let 

Show that for m <n, 

r r 

i = m I m 

and deduce that E7 =i(^ï„A„) diverges. 

5 . 18 . Given the two sériés E7 = i( 1AX Show that 



lim 

->co 


lim 

« ->co 


1 

n 

1 


\ i/'î 
\ i/« 


= 1 , 

= 1 . 


5 . 19 . Test for convergence of the sériés E7 = i^ï„, where 


(a) 

(b) 

(c) 

(d) 

(e) 

(f) 


log(l+n) 
log(l + e”') ’ 

1X3X5X X(2n-1) 1 

” 2 X 4 X 6 X ••• X 2tî 2tî + 1 ’ 

ün = ]/n — }fn , 




f M 


sin 


1 /r H 

TT 



l n j 




202 


INFINITE SEQUENCES AND SERIES 


5.20. Déterminé the values of x for which each of the following sériés 
converges uniformly: 


(a) 

(b) 

(c) 

(d) 


CO 


n -\-2 


E — — 


n = l 


3" 


E 

n = l 


10 ” 

n 


E (« + 1)^^", 

n = l 


E 

n = l 


cos(wc) 

n{n^ + 1 ) 


5.21. Consider the sériés 

CO CO ^ 

~ * 

n=l n=\ ^ 

Let = be a certain rearrangement of IZ=i^n given by 

-*-^3 2 ^ 5^7 4 ^ 9 ^ 11 6 ^ ’ 

where two positive terms are followed by one négative. Show that the 
sum of the original sériés is less than |§, whereas that of the rearranged 
sériés (which is convergent) exceeds 

5.22. Consider Cauchy’s product of v>/ith itself, where 

(- 1 )" 

Cl y, — , . Tl — 0 . X . 2 . . . . . 


Show that this product is divergent. [Hint: Show that the nth term of 
this product does not go to zéro as n ^ ao.] 


5.23. Consider the sequence of functions where for n = 1,2, . . . 



nx 

1 + nx^ ’ 


X > 0. 


Find the limit of this sequence, and détermine whether or not the 
convergence is uniform on [0, ^). 



EXERCISES 


203 


5.24. Consider the sériés 

(a) Show that this sériés converges uniformly on [1 + ô,ao), where 8 is 
any positive number. [Note: The function represented by this sériés 
is known as RiemanNs ^-function and is denoted by ^(x).] 

(b) Is ^(x) différentiable on [1 + 8,^)? If so, give a sériés expansion 
for 

In Statistics 

5.25. Prove formulas (5.56) and (5.57). 

5.26. Find a sériés expansion for the moment generating function of the 
négative binomial distribution. For what values of t does this sériés 
converge uniformly? In this case, can formula (5.63) be applied to 
obtain an expression for the kth noncentral moment (Æ= 1,2, ...) of 
this distribution? Why or why not? 

5.27. Find the first three cumulants of the négative binomial distribution. 

5.28. Show that the moments fj!^ (/r = l,2, ...) of a random variable X 
détermine the cumulative distribution functions of X uniquely if 


lim sup 

n ->CO 



is finite . 


[Hint: Use the fact that n\ ^ ” as /î ^ °o.] 

5.29. Find the moment generating function of the logarithmic sériés distribu- 
tion, and deduce that the mean and variance of this distribution are 
given by 




a 


2 




\ 

- . 

/ 


where a = —1 /log(l — 0). 

5.30. Let be a sequence of binomial random variables where the 

probability mass function of (n = 1, 2, . . . ) is given by 



-p)” ^ = 0,1,2, 


» » » 



204 


INFINITE SEQUENCES AND SERIES 


where 0 <p <1. Further, let the random variable be defined as 





(a) Show that E(X^) = np and Var(X„) = np(l — /?). 

(b) Apply Chebyshev’s inequality to show that 




p(^-p) 


where 6 > 0. 

(c) Deduce from (b) that converges in probability to zéro. [Note: 
This resuit is known as BemoullVs law of large numbers] 


5.31. Let be a sequence of independent Bernoulli random 

variables with success probability /?„. Let = Suppose that 

np^ ^ /X > 0 as ^ 00 . 

(a) Give an expression for the moment generating function of 

(b) Show that 


lim = exp( ^le‘ - IJ,) , 

which is the moment generating function of a Poisson distribution 
with mean p. 


5.32. Prove formula (5.75). 



CHAPTER 6 


Intégration 


The origin of integra! calculus can be traced back to the ancient Greeks. 
They were motivated by the need to measure the length of a curve, the area 
of a surface, or the volume of a solid. Archimedes used techniques very 
similar to actual intégration to déterminé the length of a segment of a curve. 
Democritus (410 B.c.) had the insight to consider that a cône was made up of 
infinitely many plane cross sections parallel to the base. 

The theory of intégration received very little stimulus after Archimedes’s 
remarkable achievements. It was not until the beginning of the seventeenth 
century that the interest in Archimedes’s ideas began to develop. Johann 
Kepler (1571-1630) was the first among European mathematicians to de- 
velop the ideas of infinitesimals in connection with intégration. The use of 
the term “intégral” is due to the Swiss mathematician Johann Bernoulli 
(1667-1748). 

In the présent chapter we shall study intégration of real-valued functions 
of a single variable x according to the concepts put forth by the German 
mathematician Georg Friedrich Riemann (1826-1866). He was the first to 
establish a rigorous analytical foundation for intégration, based on the older 
géométrie approach. 

6.1. SOME BASIC DEFINITIONS 

Let f(x) be a function defined and bounded on a finite interval [a, b]. 
Suppose that this interval is partitioned into a finite number of subintervals 
by a set of points P = {xq, . . . , xJ such that a =Xq <^2 < *•* <x^=h. 
This set is called a partition of [a, h]. Let Ax^ =x, —x^_^ (/ = 1, 2, . . . , n), and 
A^ be the largest of Ax^, Ax 2 , . . . , Ax„. This value is called the norm of P. 
Consider the sum 

S{P,f)=^f{t,)^x„ 

i = l 

where is a point in the subinterval [x,_i, xJ, / = 1, 2, . . . , /r. 


205 



206 


INTEGRATION 


The function f(x) is said to be Riemann intégrable on [a, b] if a number 
A exists with the following property: For any given 6 > 0 there exists a 
number ô > 0 such that 


A-S{P,f)\<€ 

for any partition P of [a, h] with a norm and for any choice of the 

point t- in i = 1,2, . . . ,n. The number A is called the Riemann 

intégral of /(x) on {a, h] and is denoted by f^f(x)dx. The intégration symbol 
/ was first used by the German mathematician Gottfried Wilhelm Leibniz 
(1646-1716) to represent a sum (it was derived from the first letter of the 
Latin word summa, which means a sum). 


6.2. THE EXISTENCE OF THE RIEMANN INTEGRAL 

In order to investigate the existence of the Riemann intégral, we shall need 
the following theorem: 

Theorem 6.2.1. Let f(x) be a bounded function on a finite interval, 
[a, b]. For every partition P = {xq, x^, . . . , of [a, b], let and be, 
respectively, the infimum and supremum of /(x) on [x^_i, xJ, / = 1, 2, . . . , n. 
If, for a given c > 0, there exists a 5 > 0 such that 

USpif)-LSpif)<e (6.1) 

whenever < 8, where is the norm of P, and 

n 

LSp{f) = 

i = \ 
n 

USp{f)= ZMAXi, 

i = l 

then /(x) is Riemann intégrable on [a,b\ Conversely, if /(x) is Riemann 
intégrable, then inequality (6.1) holds for any partition P such that A, <5. 
[The sums, LSp(f) and USp(f), are called the lower sum and upper sum, 
respectively, of /(x) with respect to the partition P.] 

In order to prove Theorem 6.2.1 we need the following lemmas: 

Lemma 6.2.1. Let P and P' be two partitions of [a, b] such that P' ^ P 
(P' is called a refinement of P and is constructed by adding partition points 
between those that belong to P). Then 


USp.{f)<USp{f), 

LSp.if)>LSpif). 



THE EXISTENCE OF THE RIEMANN INTEGRAL 


207 


Proof Let P = {xq, , x„}. By the nature of the partition P', the ith 
subinterval AXj =x^ — x,_^ is divided into parts . . . , where 

kl > 1, / = 1,2, . . . , 7î. If and dénoté, respectively, the infimum and 
supremum of /(x) on A^^^, then mi<rnSp <Mi for j = 1,2, . . . , 

/ = 1, 2, . . . , /î, where m ■ and M. are the infimum and supremum of /(x) on 
[Xj_i,xJ, respectively. It follows that 

L5p(/)= ^mMi< L trnŸ^i^[l=LSAf) 

i=l i=l j=l 

USp.(f) =LL < E M,Ax, = USp(f ) . □ 

/ = 1 7 = 1 i = l 

Lemma 6.2.2. Let P and P' be any two partitions of [a, b]. Then 

LSp(f)<USpW‘ 

Proof. Let P”=PUP\ The partition P" is a refinement of both P and 
P'. Then, by Lemma 6.2.1, 

LSp(f) <LSp..{f) < USp..{f) < USp,{f). □ 

Proof of Theorem 6.2.1 

Let 6 > 0 be given. Suppose that inequality (6.1) holds for any partition P 
whose norm A^ is less than d. Let 5(P,/) = Ax,, where ti is a 

point in [x,_i, xJ, / = 1, 2, . . . , n. By the définition of LSp(f) and USp(f) we 
can Write 

LSpif)<S(PJ)<USpif). (6.2) 

Let m and M be the infimum and supremum, respectively, of /(x) on [a, b]; 
then 


m{b-a) <LSp{f) <USp{f) <M{b-a). (6.3) 

Let us consider two sets of lower and upper sums of /(x) with respect to 
partitions P, P', P", ... such that PcP' cP" c Then, by Lemma 6.2.1, 
the set of upper sums is decreasing, and the set of lower sums is increasing. 
Furthermore, because of (6.3), the set of upper sums is bounded from below 
by m(b — a), and the set of lower sums is bounded from above by M(b — a). 
Hence, the infimum of USp(f) and the supremum of LSp(f) with respect to 
P do exist (see Theorem 1.5.1). 

From Lemma 6.2.2 it is easy to deduce that 

supL5^(/) < inffAS'^(/). 

P P 



208 


INTEGRATION 


Now, suppose that for the given e> 0 there exists a ô > 0 such that 

USpif)-LSpif)<€ (6.4) 

for any partition whose norm is less than 8. We hâve that 

LSp(f) < snpLSpif) < infUSpif) < USp{f). (6.5) 

F P 


Hence, 

miUSp{f ) — supL5p(/) < 6 . 

P P 

Since 6 > 0 is arbitrary, we conclude that if (6.1) is satisfied, then 

miUSp(f) = supL5^(/). ( 6 - 6 ) 

P P 

Furthermore, from (6.2), (6.4), and (6.5) we obtain 


\S{PJ)-A 


< 


where A is the common value of inf^ USp(f) and sup^ LSp(f). This proves 
that A is the Riemann intégral of f(x) on [a, b]. 

Let us now show that the converse of the theorem is true, that is, if f(x) is 
Riemann intégrable on [a, b], then inequality (6.1) holds. 

If f(x) is Riemann intégrable, then for a given e > 0 there exists a ô > 0 
such that 


and 


Lf(ti) àXi-A 

i=l 



(6.7) 


Hfit'i) àx^-A 


i = l 



( 6 . 8 ) 


for any partition P = {xq, . . . , of [a, b] with a norm ^ and any 
choices of tp t\ in xJ, i = 1,2, . . . , n, where A = f^f(x)dx. From (6.7) 
and ( 6 . 8 ) we then obtain 


n 


E [fOi) 


2e 


< 


i = l 


Now, Mi — m^ is the supremum of f(x) —f(x') for x, x' in [x,_^,xj, i = 
1 , 2 , . . . , 7 î. It follows that for a given 17 > 0 we can choose t[ in [x^_i, xJ so 
that 


/= 1 , 2 , 



fih) 


» » » 



THE EXISTENCE OF THE RIEMANN INTEGRAL 


209 


for otherwise M, —mi—iq would be an upper bound for f(x) —f(x') for ail 
x,x' in [xi_i, xJ, which is a contradiction. In particular, if 17 = e/[3(b — a)], 
then we can find t\ in xJ such that 


USp{f) -LSp(f) = E Ax, 

i = l 

< E [/(^,) -fUi)]^Xi+ T]{b-a) 

i = l 

< €. 

This proves the validity of inequality (6.1). □ 

Corollary 6.2.1. Let /(x) be a bounded function on [a, b]. Then /(x) is 
Riemann intégrable on [a, b] if and only if inf^ USp(f) = sup^ LSp(f), where 
LSp(f) and USp(f) are, respectively, the lower and upper sums of /(x) with 
respect to a partition P of [a, b]. 

Proof See Exercise 6.1. □ 

Example 6.2.1. Let /(x): [0,1]^ R be the function /(x)=x^. Then, 
/(x) is Riemann intégrable on [0, 1]. To show this, let P = {xq, x^, . . . , x„} be 
any partition of [0, 1], where Xq = 0, x„ = 1. Then 

LSp{f) = Ax,., 

i = l 

USp{f) = Exf Ax,.. 

i=\ 


Hence, 


USp{f) -LSpif) = E 

i=l 


n 


< 




where is the norm of P. But 

L{xf-xU)=xl-x^, = l. 

i=l 


Thus 


USp(f)- LSpif )<A^. 



210 


INTEGRATION 


It follows that for a given 6 > 0 we can choose 8 = € such that for any 
partition P whose norm is less than ô, 

USpif)-LSpif)<€. 

By Theorem 6.2.1, f(x) =x^ is Riemann intégrable on [0, 1]. 

Example 6.2.2. Consider the function f(x): [0, 1] such that f(x) = 0 
if X a rational number and f(x) = 1 if x is irrational. Since every subinterval 
of [0, 1] contains both rational and irrational numbers, then for any partition 
P = {xq, Xi, ... , x„} of [0, 1] we hâve 

usp ( f )= i:m,ax,= i:ax,=i, 

i=l i=l 

n n 

LSp{f) = J^miAxi = = 

i=l i=l 

It follows that 


infZ75^(/) = l and 


supLSp(f) = 0. 

P 


By Corollary 6.2.1, /(x) is not Riemann intégrable on [0, 1]. 


6.3. SOME CLASSES OF FUNCTIONS THAT ARE 
RIEMANN INTEGRABLE 

There are certain classes of functions that are Riemann intégrable. Identify- 
ing a given function as a member of such a class can facilitate the détermina- 
tion of its Riemann integrability. Some of these classes of functions include: 
(i) continuons functions; (ii) monotone functions; (iii) functions of bounded 
variation. 

Theorem 6.3.1. If /(x) is continuons on {a, h\ then it is Riemann 
intégrable there. 

Proof. Since /(x) is continuons on a closed and bounded interval, then by 
Theorem 3.4.6 it must be uniformly continuons on [a, h\ Consequently, for a 
given €> 0 there exists a ô > 0 that dépends only on e such that for any 
x^, X 2 in [a, b] we hâve 



SOME CLASSES OF FUNCTIONS THAT ARE RIEMANN INTEGRABLE 


211 


if Ix^— X2I <5. Let P = {xq, Xi, . . . , be a partition of P with a norm 
Ap< 8, Then 


n 

USp{f)-LSp{f)= 

/ = ! 

where and are, respectively, the infimum and supremum of /(x) on 
[Xj_i, xJ, / = 1, 2, . . . , 7î. By Corollary 3.4.1 there exist points 77^ in [x^_i, xJ 
such that m- =/( M- =f(r]^), / = 1, 2, . . . , n. Since 1 17, — < 5 for 

/ = 1,2, ...,7î, then 


n 


USp(f) -LSp(f) = E [fivd -fièi)] Ax, 

ï = 1 


« 


< 7 L Ax; = e. 

b -a .^1 


By Theorem 6.2.1 we conclude that /(x) is Riemann intégrable on [a, b]. 

□ 


It should be noted that continuity is a sufficient condition for Riemann 
integrability, but is not a necessary one. A function /(x) can hâve discontinu- 
ités in [a, b] and still remains Riemann intégrable on [a, b]. For example, 
consider the function 


f(x) 


— 1, — 1 <x < 0, 

1, 0 <x < 1. 


This function is discontinuons at x = 0. However, it is Riemann intégrable on 
[—1,1]. To show this, let 6>0 be given, and let P = {xq, x^, . . . , xJ be a 
partition of [—1,1] such that < e/2. By the nature of this function, 
/(x,) —f(xi_i) > 0, and the infimum and supremum of /(x) on [x,_i, xJ are 
equal to and /(x^), respectively, / = 1,2, . . . , n. Hence, 

USp(f) -LSp(f) = Ë Ax, 

i = l 

= Ë [f{Xi) -/(x,_i)] AX; 

i = l 

< A, Ë [/(x,) -/(x,-i)] = A^[/(l ) -fi - 1)] 

i = l 

<//(!) -/(-!)] =e. 



212 


INTEGRATION 


The function f(x) is therefore Riemann intégrable on [—1,1] by Theorem 

6.2.1. 

On the basis of this example it is now easy to prove the following theorem: 

Theorem 6.3.2. If f(x) is monotone increasing (or monotone decreasing) 
on [a, b], then it is Riemann intégrable there. 

Theorem 6.3.2 can be used to construct a function that has a countable 
number of discontinuités in [a, b] and is also Riemann intégrable (see 
Exercise 6.2). 

6.3.1. Functions of Bounded Variation 

Let f(x) be defined on [a, b]. This function is said to be of bounded variation 
on [a, b] if there exists a number M>0 such that for any partition P = 
{xq, Xi, ... , xJ of [a, b] we hâve 


E IA/,| <M, 

i = l 

where A/, =/(x^) — /(x^_i), / = 1, 2, . . . , n. 

Any function that is monotone increasing (or decreasing) on [a, b] is also 
of bounded variation there. To show this, let /(x) be monotone increasing on 
[a, b]. Then 


E IA/,I = E [f(Xi) -/(Xi-i)] =f{b) -f{a). 

i=l i=l 

Hence, if M is any number greater than or equal to f(b) —f(a), then 
I A/^l < M for any partition P of [a, b], 

Another example of a function of bounded variation is given in the next 
theorem. 

Theorem 6.3.3. If /(x) is continuons on [a, b] and its dérivative f'(x) 
exists and is bounded on (a, b), then /(x) is of bounded variation on [a, b]. 

Proof. Let P = {xq, x^, . . . , xJ be a partition of [a, b]. By applying the 
mean value theorem (Theorem 4.2.2) on each [Xj_^,xJ, / = 1, 2, . . . , n, we 
obtain 


EIA/,I = Ë|/'(^,)A^,'I 

i = l i=l 

n 

<iCE =K(b-a), 

/=! 



SOME CLASSES OF FUNCTIONS THAT ARE RIEMANN INTEGRABLE 


213 


where / = 1,2, . . . , n, and À' > 0 is such that \f'(x)\ <K on 

(a, b), □ 

It should be noted that any function of bounded variation on [a, b] is also 
bounded there. This is true because if a<x<b, then P = {a,x,b} is a 
partition of [a, b]. Hence, 

|/(x) -f{a)\ + \f{b) -f{x)\<M. 

for some positive number M. This implies that |/(x)| is bounded on [a, b] 
since 


|/(x) I < i[|/(x) -f{a) I + |/(x) -f{b) I + \f{a) +f{b) |] 
<\[M+\f{a)+f{b)\\. 

The converse of this resuit, however, is not necessarily true, that is, if f{x) is 
bounded, then it may not be of bounded variation. For example, the function 


fO) 


( l 

xcos — , 
\2xj' 

lo, 


0 <x < 1, 

X = 0 


is bounded on [0, 1], but is not of bounded variation there. It can be shown 
that for the partition 


j 1 1 11^ 

r’2n’2n-r“” 3’2’V’ 

^ 00 as 7 î ^ 00 and hence cannot be bounded by a constant M for 
ail n (see Exercise 6.4). 

Theorem 6.3.4. If f(x) is of bounded variation on [a, b], then it is 
Riemann intégrable there. 

Proof. Let c>0 be given, and let P = {xg, x^, . . . , x„} be a partition of 
[a, b]. Then 


USp{f)-LSp{f)= (6.9) 

/ = ! 

where m, and M- are the infimum and supremum of /(x) on [Xj_^,xJ, 
respectively, / = 1, 2, . . . , n. By the properties of and M., there exist 



214 


INTEGRATION 


and in xJ such that for / = 1, 2 , . . . , n, 

mi<f{^i) <mi+ 

where 6 ' is a small positive number to be determined later. It follows that 

</('i7;) i= 1,2,. ..,n. 


Hence, 


Mi -mi<2e’ +f{ 17,) -/( 


^2e' + |/(^,.) -fiVi)\, i=l,2,...,n. 
From formula (6.9) we obtain 


n 


n 


USpif) -L5p(/) <2e' E Ax,+ E |/(^,) -/(r?,)|Ax,. 

i=l i=l 


( 6 . 10 ) 


Now, if is the norm of P, then 

L |/( -fiVi) |Ax, < A^ ï: |/( ^,) -f(r,i) I 

i=l i=l 

m 

^A,EI/(^,)-/(2,-i)|, (6.11) 

i = l 

where {zq? 2 :^, . . . , z^} is a partition Q of [a, b], which consists of the points 
Xo,Xi,...,x„ as well as the points t?i, ^ 2. ^2. • • • . that is, g is a 

refinement of P obtained by adding the ^/s and ?7/s {i= 1,2, ...,/r). Since 
f{x) is of bounded variation on [a, h\ there exists a number M > 0 such that 

m 

E|/(z,)-/(z,_i)|<M. (6.12) 

i = l 

From (6.10), (6.11), and (6.12) it follows that 

USp{f) -LSp{f) <2€\h-a) +MA^. (6.13) 

Let us now select the partition P such that A^ < ô, where M8 < e/2. If we 
also choose e' such that 2e'(b — a) < e/2, then from (6.13) we obtain 
USp(f) — LSp(f) < e. The function f(x) is therefore Riemann intégrable on 
[a, b] by Theorem 6.2.1. □ 



PROPERTIES OF THE RIEMANN INTEGRAL 


215 


6.4. PROPERTIES OF THE RIEMANN INTEGRAL 

The Riemann intégral has several properties that are useful at both the 
theoretical and practical levels. Most of these properties are fairly simple and 
striaghtforward. We shall therefore not prove every one of them in this 
section. 

Theorem 6.4.1. If f(x) and g(x) are Riemann intégrable on [a, b] and if 
and C 2 are constants, then c^fix) C 2 g(x) is Riemann intégrable on 
[a, b], and 


f [c^/(x) C 2 g{x)] dx = f f(x) dx C 2 f g{x) dx. 

a a a 

Theorem 6.4.2. If /(x) is Riemann intégrable on [a, b\ and m <f(x) < M 
for ail X in [a, b], then 


m 


(b — a) < j f(x) dx <M(b — a). 


Theorem 6.4.3. If fix) and g(x) are Riemann intégrable on [a, b], and if 
f(x) <g(x) for ail X in [a, b], then 


J f(x) dx < j g(x) dx. 


Theorem 6.4.4. If /(x) is Riemann intégrable on [a, b] and if a <c <b, 
then 


b ^ ly 

f f(x)dx= f f(x)dx-\- f f(x)dx. 

''a ''a 


Theorem 6.4.5. If /(x) is Riemann intégrable on {a, b\ then so is |/(x)| 
and 


\''f{x) dx < hf(x)\dx. 

''a •'a 


Proof. Let P = {Xq, x^, . . . , x„} be a partition of [a, b\ Let m- and M- be 
the infimum and supremum of /(x), respectively, on [Xj_^, xJ; and let m\, M[ 
be the same for |/(x)| . We daim that 

— — / = 1 , 2, . . . , n . 

It is obvions that = M- — m\ if /(x) is either nonnegative or nonposi- 

tive for ail x in [x,_^, xJ, / = 1, 2, . . . , n. Let us therefore suppose that /(x) is 


» » » 



216 


INTEGRATION 


négative on and nonnegative on , where and are such that 
D~ U = Dj^ = xJ for / = 1, 2, . . . , n. We than hâve 

Mi-nii = sup/(x) - inf/(x) 

Dt 

= sup|/(x)|- inf(-|/(x)|) 

D.+ Di 

= sup|/(x)|+ sup|/(x)| 

Dt Dr 

> sup|/(x)|=M/, 

A 

since sup^, |/(x)| = max{sup^+ |/(x)| ,sup^- |/(x)|}. Hence, — 

M/ — m\ for / = 1, 2, . . . , 7î, which proves our daim. 

us p{\f\) - LS p{\f\) = Y.{M[-m\)Sx,< 

i=l i=l 


Hence, 


C/5p(|/|) -LSp{\n)<USp{f) -LSp(f). (6.14) 

Since f(x) is Riemann intégrable, the right-hand side of inequality (6.14) can 
be made smaller than any given e > 0 by a proper choice of the norm of 
P. It follows that |/(x)| is Riemann intégrable on [a, b] by Theorem 6.2.1. 
Furthermore, since +/(x) < |/(x)| for ail x in [a, b], then ^f(x) dx < 
/j"|/(x)| by Theorem 6.4.3, that is. 


+ 


f’’f{x)dx< f^\f(x)\ck. 

''a ''a 


Thus, I f^f(x)dx\ < f^\f(x)\dx, □ 

Corollary 6.4.1. If /(x) is Riemann intégrable on [a, b], then so is /^(x). 


Proof Using the same notation as in the proof of Theorem 6.4.5, we hâve 
that mf and are, respectively, the infimum and supremum of /^(x) on 
[Xj_i,xJ for / = 1,2, ...,/î. Now, 


- mf = ( M; - m'i)( M/ + m' ) 
<2M'(M;-m;.) 
<2M\M^-m^), 1 = 1 , 2 , 



» » » 


(6.15) 



PROPERTIES OF THE RIEMANN INTEGRAL 


217 


where M' is the supremum of |/(x)| on [a, h\ The Riemann integrability of 
f^(x) now follows from inequality (6.15) by the Riemann integrability of f(x). 

□ 

Corollary 6.4.2. If f(x) and g(x) are Riemann intégrable on [a, b], then 
so is their product f(x)g(x). 

Proof This follows directly from the identity 

4f{x)g{x) = [/(x) +g{x)Ÿ - [f(x) -g{x)Ÿ, (6.16) 

and the fact that the squares of f(x)-\-g(x) and f(x)—g(x) are Riemann 
intégrable on [a, b] by Theorem 6.4.1 and Corollary 6.4.1. □ 

Theorem 6.4.6 (The Mean Value Theorem for Intégrais). If f(x) is 
continuons on [a, b], then there exists a point c e [a, b] such that 

f^f{x)dx={b-a)f{c). 


Proof. By Theorem 6.4.2 we hâve 

1 rb 

m < / f(x)dx< M, 

b — a Ja 

where m and M are, respectively, the infimum and supremum of f{x) on 
[a, b]. Since /(x) is continuons, then by Corollary 3.4.1 it must attain the 
values m and M at some points inside [a, b\ Furthermore, by the intermedi- 
ate-value theorem (Theorem 3.4.4), /(x) assumes every value between m and 
M. Hence, there is a point c ^ [a, b] such that 

= é-aL 

Définition 6.4.1, Let /(x) be Riemann intégrable on [a, b]. The function 


/(x) dx. 


□ 


= [ /(O dt, 

''a 


a <x <b, 


is called an indefinite intégral of /(x). □ 


Theorem 6.4.7. If /(x) is Riemann intégrable on [a, b], then F(x) = 
fff(t)dt is uniformly continuons on [a, b]. 



218 


INTEGRATION 


Proof. Let X 2 be in [a, h\ <X 2 * Then, 


\F{X2) -F{x,)\ = 


f ^f{t)dt- f "f{t)dt 

''a ''a 


f 

'^X^ 


by Theorem 6.4.4 


X 2 

<f \f(t)\dt, by Theorem 6.4.5 

<M'(X2 —Xi), 


where M' is the supremum of |/(x)| on [a, b]. Thus if 6>0 is given, then 
\F{x 2 ) — F(x^)\ <€ provided that Ix^— X 2 I <e/M\ This proves uniform 
continuity of F(x) on [a, b]. □ 

The next theorem présents a practical way for evaluating the Riemann 
intégral on [a, b]. 

Theorem 6.4.8. Suppose that /(x) is continuons on [a, b]. Let F(x) = 
Then we hâve the following: 

i. dF{x)/dx=f(x), a<x<b. 

ii. laf(x) dx = G(b) — G(a), where G(x) = F(x) + c, and c is an arbitrary 
constant. 


Proof. We hâve 


dF{x) 

dx 


d rX 1 


dx J a 


lim - dt, 

u^(\ h J.. 


''a a 


by Theorem 6.4.4 


lim/(x+ Oh), 

/z -»0 


by Theorem 6.4.6, where 0 < ^< 1. Hence, 

= \ivaf{x+ eh)=f{x) 
dx /i^o 

by the continuity of /(x). This resuit indicates that an indefinite intégral of 
/(x) is any function whose dérivative is equal to /(x). It is therefore unique 
up to a constant. Thus both F{x) and F(x) + c, where c is an arbitrary 
constant, are considered to be indefinite intégrais. 




PROPERTIES OF THE RIEMANN INTEGRAL 


219 


To prove the second part of the theorem, let G(x) be defined on [a, b] as 

G{x) =F(x) c = f f{t) dt c, 

"'a 

that is, G(x) is an indefinite intégral of /(x). If x = a, then G(a) = c, since 
F(a) = 0. Also, if x = b, then G(b) = F(b) c = dt G(a), It follows 
that 



dt = G{b)-G{a), 


□ 


This resuit is known as the fundamental theorem ofcalculus. It is generally 
attributed to Isaac Barrow (1630-1677), who was the first to realize that 
différentiation and intégration are inverse operations. One advantage of this 
theorem is that it provides a practical way to evaluate the intégral of fix) on 
[a, b]. 

6.4.1. Change of Variables in Riemann Intégration 

There are situations in which the variable x in a Riemann intégral is a 
function of some other variable, say u. In this case, it may be of interest to 
détermine how the intégral can be expressed and evaluated under the given 
transformation. One advantage of this change of variable is the possibility of 
simplifying the actual évaluation of the intégral, provided that the transfor- 
mation is properly chosen. 

Theorem 6.4.9. Let /(x) be continuons on [a, 13], and let x = g(u) be a 
function whose dérivative g'(u) exists and is continuons on [c,d]. Suppose 
that the range of g is contained inside [a, f3]. If a, b are points in [a, f3] 
such that a =g(c) and b =g(d), then 


iy{x) ck = jy[g(u)]g'{u) du. 


Proof Let F{x) = fyf(t)dt. By Theorem 6.4.8, F'(x) =/(x). Let G(u) be 
defined as 


G(u)= f f[g{t)]g'{t)dt. 


Since /, g, and g' are continuons, then by Theorem 6.4.8 we hâve 


dG{u) 

du 


=f[g{n)]g'{u)- 


(6.17) 



220 


INTEGRATION 


However, according to the chain rule (Theorem 4.1.3), 

dF[g{u)] _ dF[g{u)] dg{u) 
du dg{u) du 

=/[^(«)]^'(«)- (6-18) 

From formulas (6.17) and (6.18) we conclude that 

G{u)-F[g{u)]=X, (6.19) 

where A is a constant. If a and b are points in[a, fB] such that a =g(c), b = 
g{d), then when u = c, we hâve G(c) = 0 and A= — F[g(c)] = — F(a) = 0. 
Furthermore, when u =d,G(d) = fff[g(t)]g'(t) dt. From (6.19) we then ob- 
tain 


G(d)= G[g{t)]g'{t)dt = F[g{d)] + A 

•'c 

= F(b) 

= f f(x) dx, 

''a 

For example, consider the integra! — ly^^tdt. Let x = 2^^ — 1. 

Then dx = Atdt, and by Theorem 6.4.9, 

An indefinite intégral of x^^^ is given by Hence, 

f\2t^ - lŸ^^dt = i(f )(7^/2 - 1) = _ 1) . □ 

6.5. IMPROPER RIEMANN INTEGRALS 

In our study of the Riemann intégral we hâve only considered intégrais of 
functions that are bounded on a finite interval [a, b]. We now extend the 
scope of Riemann intégration to include situations where the integrand can 
become unbounded at one or more points inside the range of intégration, 
which can also be infinité. In such situations, the Riemann intégral is called 
an improper intégral. 

There are two kinds of improper intégrais. If f(x) is Riemann intégrable 
on [a, b] for any b> a, then f^f(x)dx is called an improper intégral of the 
first kind, where the range of intégration is infinité. If, however, f(x) 



IMPROPER RIEMANN INTEGRALS 


221 


becomes infinité at a finite number of points inside the range of intégration, 
then the intégral f^f(x)dx is said to be improper of the second kind. 

Définition 6.5.1, Let F(z) = f^f(x)dx. Suppose that F(z) exists for any 
value of Z greater than a. If F(z) has a finite limit L as z then the 
improper intégral fff(x)dx is said to converge to L. In this case, L 
represents the Riemann intégral of /(x) on [a,^) and we write 



j f{x) dx. 


On the other hand, if L = +ao, then the improper intégral fff(x)dx is said 
to diverge. By the same token, we can define the intégral f^_^f(x)dx as the 
limit, if it exists, of f^_^f(x)dx as z ^ Also, fl^f(x)dx is defined as 


.CO a .Z 

/ f(x)dx= lim / f(x)dx-\- lim / f(x)dx, 

J — CO ^ 2“>co J ^ 

where a is any finite number, provided that both limits exist. 

The convergence of fff(x)dx can be determined by using the Cauchy 
criterion in a manner similar to the one used in the study of convergence of 
sequences (see Section 5.1. 1). 


Theorem 6.5.1. The improper intégral fff(x)dx converges if and only if 
for a given €>0 there exists a Zq such that 

< 6 , ( 6 . 20 ) 

whenever z^ and Z 2 exceed Zq. 



Proof If F{z) = f^f(x)dx has a limit L as z ^ 00 , then for a given e > 0 
there exists Zq such that for z > Zg . 


F(z)-L 


€ 



Now, if both z^ and Z 2 exceed Zg, then 


j y{x) dx 


= \Fiz,)-F{z,)\ 


<\F{Z2)-L\+\F{z,)-L\<€. 


Vice versa, if condition (6.20) is satisfied, then we need to show that F(z) has 
a limit as z ^ Let us therefore define the sequence where is 



222 


INTEGRATION 


given by 

/ a +n 

f{x)dx, n = l,2,.... 

a 

It follows that for any 6 > 0, 

/ a +n 

f{x)dx <e, 

U +m 

if m and n are large enough. This implies that is a Cauchy sequence; 

hence it converges by Theorem 5.1.6. Let g = lim„^^g„. To show that 
lim^^^ F{z) =g, let us write 

|F(z) -g| =|F(z) -g„+g„-g\ 

-g„| + lg„-gl- (6.21) 

Suppose c > 0 is given. There exists an integer such that |g„ —g\< c/2 if 
n> Ni. Also, there exists an integer N 2 such that 

\P(z)-g„\= f f{x)dx <- (6.22) 

+n ^ 

if z> a-\-n> N 2 ‘ Thus by choosing z> a n, where n > max(A^ N 2 — ci), 
we get from inequalities (6.21) and (6.22) 

\F(z) -g\<€. 

This complétés the proof. □ 

Définition 6.5.2. If the improper intégral ff\f(x)\dx is convergent, then 
the intégral fff(x)dx is said to be absolutely convergent. If fff(x)dx is 
convergent but not absolutely, then it is said to be conditionally convergent. 

□ 

It is easy to show that an improper intégral is convergent if it converges 
absolutely. 

As with the case of sériés of positive terms, there are comparison tests that 
can be used to test for convergence of improper intégrais of the first kind of 
nonnegative functions. These tests are described in the following theorems. 

Theorem 6.5.2. Let f(x) be a nonnegative function that is Riemann 
intégrable on [a, b] for every b >a. Suppose that there exists a function g(x) 
such that f(x) < g(x) for x>a.li ffg(x) dx converges, then so does fff(x) dx 



IMPROPER RIEMANN INTEGRALS 


223 


and we hâve 


/ f(x)dx< I g(x)dx. 


Proof. See Exercise 6.7. □ 


Theorem 6.5.3. Let f(x) and g(x) be nonnegative functions that are 
Riemann intégrable on [a, b] for every b > a. If 


lim 


• 00 


f(x) , 


where Æ is a positive constant, then f^y(x)dx and f^g(x)dx are either both 
convergent or both divergent. 


Proof. See Exercise 6.8. □ 

Example 6.5.1. Consider the intégral f^e~^x^ dx. We hâve that = 1 + 
= Hence, for x>l,e^ >x^/p\, where p is any positive integer. If 

P is chosen such that p — 2> 2, then 







< 




However, J^(dx/x^) = [—l/x]i = 1. Therefore, by Theorem 6.5.2, the inté- 
gral of e~^x^ on [1,=^) is convergent. 


Example 6.5.2. The intégral /^[(sin x)/(x + l)^]dx is absolutely conver- 
gent, since 


|sin x\ 

(^+if 


1 

{x+lf 


and 


^00 dx 

1 

■'o (x + lŸ 

X + 1 


Example 6.5.3. The intégral /^(sin is conditionally convergent. 

We first show that /^(sin x/x)dx is convergent. We hâve that 


L 


^ sin X 


0 





(6.23) 



224 


INTEGRATION 


By Exercise 6.3, (sin x)/x is Riemann intégrable on [0, 1], since it is continu- 
ons there except at x = 0, which is a discontinuity of the first kind (see 
Définition 3.4.2). As for the second intégral in (6.23), we hâve for Z 2 >Zi > 1, 


Thus 


.Z2 sin X 
/ dx = 


2, X 


COS X 


X 


f-/ 


22 cos X 


dx 


2 . X- 


COSZ^ COSZ2 JZ2COSX 


dx. 


Z- 


Zo •'2, X' 


.22 sin X 
/ dx 




X 


1 1 /*^2 dx 2 

< h h / — = — . 

^2 ^ 


Since 2/Zi can be made arbitrarily small by choosing z^ large enough, then 
by Theorem 6.5.1, /“(sin x/x)rfr is convergent and so is /o(sin x/x)<A:. 

It remains to show that /o(sinx/x)A: is not absolutely convergent. This 
follows from the fact that (see Exercise 6.10) 


f.mr 

sin X 

lim / 


n^oo Jq 

X 


dx = ^. 


Convergence of improper intégrais of the first kind can be used to 
détermine convergence of sériés of positive terms (see Section 5.2.1). This is 
based on the next theorem. 


Theorem 6.5.4 (Maclaurin’s Intégral Test). Let Un = \^n a sériés of 
positive terms such that + i for n>l. Let /(x) be a positive nonin- 
creasing function defined on [1,^) such that f(n) = a^, tî = 1,2, ..., and 
/(x) ^ 0 as x^=^. Then, lZ = i^n converges if and only if the improper 
intégral /“/(x) A: converges. 


Proof lî n>l and n <x <n 1, then 

a„=f{n) >f{x) >/(« + 1) + 

By Theorem 6.4.2 we hâve for n > 1 

a^> r^^f{x)dx>a^^^. (6.24) 

''n 


If ^n^'^k=i^k partial sum of the sériés, then from inequality (6.24) 

we obtain 



IMPROPER RIEMANN INTEGRALS 


225 


If the sériés = converges to the sum then ^ for ail n. Conse- 
quently, the sequence whose nth term is F{n + 1) = is monotone 

increasing and is bounded by hence it must hâve a limit. Therefore, the 
intégral converges. 

Now, let us suppose that j^f{x)dx is convergent and is equal to L. Then 
from inequality (6.25) we obtain 

rn + l 

+ f{x) dx <a^-\- L, n>l, (6.26) 

h 

since f(x) is positive. Inequality (6.26) indicates that the monotone increasing 
sequence is bounded hence it has a limit, which is the sum of the 

sériés. □ 

Theorem 6.5.4 provides a test of convergence for a sériés of positive terms. 
Of course, the usefulness of this test dépends on how easy it is to integrate 
the fonction /(x). 

As an example of using the intégral test, consider the harmonie sériés 
£^ = i(l/?î)- If f(x) is defined as f(x) = 1/x, x>l, then F(x) = fif(t)dt = 
log X. Since F(x) goes to infinity as x ^ the harmonie sériés must 
therefore be divergent, as was shown in Chapter 5. On the other hand, the 
sériés E“=i(l/n^) is convergent, since F(x) = fi(dt/t^) = 1 — 1/x, which 
converges to I as x ^ 

6.5.1. Improper Riemann Intégrais of the Second Kind 

Let us now consider intégrais of the form f^f(x)dx where [a, b] is a finite 
interval and the integrand becomes infinité at a finite number of points 
inside [a, b]. Such intégrais are called improper intégrais of the second kind. 
Suppose, for example, that /(x) ^ 00 as x^a'^. Then j^f(x)dx is said to 
converge if the limit 



exists and is finite. Similarly, if /(x) ^ 00 as x^b , then f^f(x)dx is 
convergent if the limit 



exists. Furthermore, if /(x) ^ 00 as x ^ c, where a <c <b, then /j’Z(x) dx is 
the sum of f/f(x)dx and f^f(x)dx provided that both intégrais converge. By 
définition, if /(x) ^00 as x ^Xq, where Xq ^ [a, b], then Xq is said to be a 
singularity of /(x). 



226 


INTEGRATION 


The following theorems can help in determining convergence of intégrais 
of the second kind. They are similar to Theorems 6.5.1, 6.5.2, and 6.5.3. Their 
proofs will therefore be omitted. 

Theorem 6.5.5. If f(x)^^ as x^a'^, then f^f(x)dx converges if and 
only if for a given e > 0 there exists a Zq such that 



< 


where and Z 2 are any two numbers such that a <Zi<Z 2 <ZQ<b. 

Theorem 6.5.6. Let f(x) be a nonnegative fonction such that f^f(x)dx 
exists for every c in {a, b]. If there exists a fonction g(x) such that f(x) < g(x) 
for ail X in (a, b], and if f^^g(x)dx converges as c^a'^, then so does 
fcf(x)dx and we hâve 


f f(x) dx < f g(x) dx. 


Theorem 6.5.7. Let /(x) and g(x) be nonnegative fonctions that are 
Riemann intégrable on [c,b] for every c such that a < c < ^. If 


lim 


X 


•a 


+ 


f(x) _ 

g(^) ’ 


where Æ is a positive constant, then fj’f(x) dx and fj’g(x) dx are either both 
convergent or both divergent. 


Définition 6.5.3. Let f^f(x)dx be an improper intégral of the second 
kind. If f^\f(x)\dx converges, then f^f(x)dx is said to converge absolutely. 
If, however, f^f(x)dx is convergent, but not absolutely, then it is said to be 
conditionally convergent. □ 


Theorem 6.5.8. If f^\f(x)\dx converges, then so does f^f(x)dx. 


Example 6.5.4. Consider the intégral fQe~^x^~^ dx, where n > 0. If 
0 <n <1, then the intégral is improper of the second kind, since ^ 00 as 
X 0^. Thus, X = 0 is a singularity of the integrand. Since 


lim 


e 



n — l 


X 


n — l 



then the behavior of dx with regard to convergence or divergence 

is the same as that of /qX”“^ dx. But JqX^~^ dx = (l/n)[x^]l= 1/n is con- 
vergent, and so is /oC“^x”“^ dx. 



CONVERGENCE OF A SEQUENCE OF RIEMANN INTEGRALS 


227 


Example 6.5.5. /oHsin x/x^)dx. The integrand has a singularity at x = 0 . 
Let g{x)=l/x. Then, (sin x)/[x^g(x)] ^ 1 as But jQ(dx/x) = 

[log x]q is divergent, since log x ^ as x ^ 0"^. Therefore, /oHsin x/x^)dx 
is divergent. 

Example 6.5.6. Consider the intégral /q^(x^ — 3x + l)/[x(x — 

Here, the integrand has two singularities, namely x = 0 and x = 1, inside 
[0,2]. We can therefore write 



x^ — 3x + 1 
x(x — 1)^ 



/-1/2X -3x+ 1 

lim / 5- dx 

x(x — 1) 

çu x^ — 3x + 1 

+ lim / J- dx 

u^l~ h /2 x(x — 1) 

— 3x + 1 

+ lim / :^dx. 

h x(x — 1) 


We note that 


x^ — 3x + 1 1 1 

x(x — 1)^ ^ (x — 1)^ 


Hence, 


i 


x^ — 3x + 1 

:^dx= lim 

0 x(x-l) 


log X + 


1 


X — 1 


1/2 




+ lim 
W ^ 1 ’ 


log X + 


1 


T U 


X — 1 


J 1/2 


+ lim 


log X + 


1 


X — 1 


i2 


J U 


None of the above limits exists as a finite number. This intégral is therefore 
divergent. 


6.6. CONVERGENCE OF A SEQUENCE OF RIEMANN INTEGRALS 

In the présent section we confine our attention to the limiting behavior of 
intégrais of a sequence of fonctions {/„(x)}“=i. 



228 


INTEGRATION 


Theorem 6.6.1. Suppose that /„(x) is Riemann intégrable on [a, b] for 
n>l.lî /„(x) converges uniformly to f(x) on [a, è] as n ^ then f(x) is 
Riemann intégrable on [a, b] and 


lim / f^(x)dx= / f(x)dx. 
n^coJa J a 


Proof Let us first show that /(x) is Riemann intégrable on [a,b\ Let 
6 > 0 be given. Since /„(x) converges uniformly to /(x), then there exists an 
integer tîq that dépends only on e such that 


ii n> fÏQ for ail x e [a, b\ Let 11 ^ > Since /„^(x) is Riemann intégrable on 
[a, b\ then by Theorem 6.2.1 there exists a ô > 0 such that 

(6.28) 

for any partition P oi{a,b] with a norm A, < 5 . Now, from inequality (6.27) 
we hâve 




We conclude that 


USp{f)<USp{f,) + -, (6.29) 

L5p(/)>L5p(/„J-^. (6.30) 

From inequalities (6.28), (6.29), and (6.30) it follows that if < ô, then 

USp{f) -LSp(f) < USp{f„) -LSp{f,) + y < e. (6.31) 


Inequality (6.31) shows that /(x) is Riemann intégrable on [a,b\ again by 
Theorem 6.2.1. 



SOME FUNDAMENTAL INEQUALITIES 


229 


Let us now show that 


lim 

n ->CO 


jjn{x)dx = jj{x)dx. 


(6.32) 


From inequality (6.27) we hâve for n > 


ffn{x)dx-ff{x)dx < f\f„{x) -f{x)\dx 

a a a 


€ 



and the resuit follows, since e is an arbitrary positive number. 


□ 


6.7. SOME FUNDAMENTAL INEQUALITIES 

In this section we consider certain well-known inequalities for the Riemann 
intégral. 


6.7.1. The Cauchy- Schwarz Inequality 

Theorem 6.7.1. Suppose that /(x) and g(x) are such that f^(x) and 
g^(x) are Riemann intégrable on [a, b]. Then 


i2 


f f{x)g{x)dx < ( p{x)dx f g^{x)dx 

a a a 


(6.33) 


The limits of intégration may be finite or infinité. 

Proof. Let and C 2 be constants, not both zéro. Without loss of general- 
ity, let us assume that C 2 ^ 0. Then 

/ b 2 

\cj{x) +C2g{x)] dx>0. 

U 


Thus the quadratic form 



+ 


^CiC 2 j’'f{x)g{x) dx 




230 


INTEGRATION 


is nonnegative for ail and C 2 . It follows that its discriminant, namely, 


i2 




f f{x)g{x)dx -cl ( f^{x)dx f g^{x)dx 

a a a 


must be nonpositive, that is, 


i2 


/ f{x)g{x)dx < f f{x)dx f g^{x)dx 

a a a 


It is easy to see that if f(x) and g(x) are linearly related [that is, there exist 
constants and T 2 , not both zéro, such that r^f(x) T 2 g(x) = 0], then 
inequality (6.33) becomes an equality. □ 


6.7.2. Holder’s Inequality 

This is a generalization of the Cauchy-Schwarz inequality due to Otto 
Hôlder (1859-1937). To prove Hôlder’s inequality we need the following 
lemmas: 

Lemma 6.7.1. Let . . . , A 2 , . . . , be nonnegative numbers 

such that A- = 1. Then 


(6.34) 

ï = l i = l 

The right-hand side of inequality (6.34) is a weighted arithmetic mean of 
the a/s, and the left-hand side is a weighted géométrie mean. 

Proof This lemma is an extension of a resuit given in Section 3.7 concern- 
ing the properties of convex functions (see Exercise 6.19). □ 

Lemma 6.7.2. Suppose that /^(x),/ 2 (x), . . . ,/„(x) are nonnegative and 
Riemann intégrable on {a, h]. If A^, A 2 , . . . , A„ are nonnegative numbers such 
that A, = 1, then 


/' 

a 


n 


YlfKx) 

i=\ 


n 


dx<Y\ 

i = l 


jy^{x)dx 


(6.35) 


Proof. Without loss of generality, let us assume that j^fiix)dx>0 for 
/=1,2, ...,7î [inequality (6.35) is obviously true if at least one ffx) is 


» » » 



SOME FUNDAMENTAL INEQUALITIES 


231 


identically equal to zéro]. By Lemma 6.7.1 we hâve 


fî[nufHx)]dx 

nu[lafiix)dxŸ‘ 


rb 

fi(x) 

Al 

flix) 

^2 

fn{x) 

i 

Jafl(x)dx 


Jaf2{x)dx 


Jafn{x)dx 



< 


« !afi{x)dx 


U ^ 

L 


n 


dx= A, = 1 . 


i= 1 


Hence, inequality (6.35) follows. □ 


Theorem 6.7.2 (Hôlder’s Inequality). Let p and q be two positive num- 
bers such that 1/p 1/q = 1. lî |/(x)|^ and |g(x)|^ are Riemann intégrable 

on [a, b], then 


ii/p r 


f fix)g(x)dx < ( \f{x)\^ dx ( \g{x)f dx 

a a a 




Proof. Define the functions 


U 


(x) =\f{x)\‘\ v{x)=\g{x)\\ 


Then, by Lemma 6.7.2, 


b \ b b 

j u(xŸ^^v(xŸ^'^ dx < I u(x)dx j v(x) dx 

a a a 




that is. 


f’'\fix)\\g{x)\dx< f''\f{x)f’ dx hg(x)f dx 

a a a 


The theorem follows from inequality (6.36) and the fact that 


V? 


. (6.36) 


ljix)g{x)dx < j^\f{x)\\g{x)\dx. 


We note that the Cauchy-Schwarz inequality can be deduced from Theorem 
6.7.2 by taking p = q = 2. □ 



232 


INTEGRATION 


6.7.3. Minkowski’s Inequality 

The following inequality is due to Hermann Minkowski (1864-1909). 

Theorem 6.7.3. Suppose that fix) and g(x) are functions such that 
|/(x)|^ and |g(x)|^ are Riemann intégrable on [a, b], where 1 <p <^. Then 


h \ h 

^ \f(x) +gix)f dx < ^ |/(x) 


\P 


1/P 


dx 


+ 


"'a 


1/P 


dx 


Proof The theorem is obviously true if p = 1 hy the triangle inequality. 
We therefore assume that p > 1. Let ^ be a positive number such that 
1/p l/q = l. Hence, p =p(l/p + l/q) = 1 ^p/q^ Let us now write 

|/(x) +g(x) f = \f{x) +g{x)\\f{x) +g(x) 1”^“ 

<\fix)\\f(x) +g(x)/'^ + |g(x)||/(x) +g(x)/\ 

{631) 


By applying Hôlder’s inequality to the two terms on the right-hand side of 
inequality (6.37) we obtain 


l''\f{x)\\f{x) +g{x)f^‘’ dx 

''a 


< 


h h 

f \f(x)f dx f \f{x) +g{x)f dx 

''a ''a 


Wq 


r\g{x)\\f{x) +g{x)f^‘’ dx 

'^a 


< 


iVp r 


f’’\g{x)f dx hfix) +g{x)f dx 

"'a "'a 




(6.38) 


(6.39) 


From inequalities (6.37), (6.38), and (6.39) we conclude that 



SOME FUNDAMENTAL INEQUALITIES 


233 


Since l — l/q = 1//?, inequality (6.40) can be written as 


+g{x)f dx < j^Jf{x) 


\P 


1/P 


dx 


+ 


f\g{x)\‘ 

''a 


1/P 


dx 


Minkowski’s inequality can be extended to intégrais involving more than 
two functions. It can be shown (see Exercise 6.20) that if \fi(x)\^ is Riemann 
intégrable on [a, b] for / = 1, 2, . . . , n, then 


/' 

a 


n 


iP 


1/P 


n 


Hftix) 


dx 


i = l 


< L 

i=l L 


flMx) 

a 


iP 


ll/P 


dx 


□ 


6.7.4. Jensen’s Inequality 

Theorem 6.7.4. Let X be a random variable with a finite expected value, 
fjb = E{X). If ()){x) is a twice différentiable convex function, then 

£[c^.(X)] >cA[£(X)]. 

Proof. Since (/>(x) is convex and (x) exists, then we must hâve cj)” (x) > 0. 
By applying the mean value theorem (Theorem 4.2.2) around /x we obtain 

<^>(X) = <^(/x) + (X-/x)(A'(c), 

where c is between /x and X. If X — /x > 0, then c > /x and hence cj)'(c) > 
c^'(/x), since ^"(x) is nonnegative. Thus, 

cA(X) -ck{fi) = (X- /x)<^^c) > (X- fjL)ck'{ /x). (6.41) 

On the other hand, if X — /x < 0, then c < /x and </>'(c) <</>'( /x). Hence, 

cA(X) -ck{fi) = (X- /x)<^^c) > (X- fjL)ck'{ /x). (6.42) 

From inequalities (6.41) and (6.42) we conclude that 

£[(/>(X) - c/)( /.)]>(/)'( /x)£(X- /.)= 0, 

which implies that 

£[c/)(X)] >cA( fji), 
since E[<l)( /x)] = (/>( /x). □ 



234 


INTEGRATION 


6.8. RIEMANN-STIELTJES INTEGRAL 

In this section we consider a more general intégral, namely the Riemann- 
Stieltjes intégral. The concept on which this intégral is based can be at- 
tributed to a combination of ideas by Georg Friedrich Riemann (1826-1866) 
and the Dutch mathematician Thomas Joannes Stieltjes (1856-1894). 

The Riemann-Stieltjes intégral involves two functions f(x) and g(x), both 
defined on the interval [a, b], and is denoted by f^f(x) dg(x). In particular, if 
g(x)=x we obtain the Riemann intégral j^f(x)dx, Thus the Riemann 
intégral is a spécial case of the Riemann-Stieltjes intégral. 

The définition of the Riemann-Stieltjes intégral of f(x) with respect to 
g(x) on [a, b] is similar to that of the Riemann intégral. If f(x) is bounded on 
[a, b], if g(x) is monotone increasing on [a, 6], and if P = {xq, . . . , x„} is a 
partition of [a, b], then as in Section 6.2, we define the sums 


n 

LSp(f,g) = Y. mi Agi, 

i = l 

n 

USp{f,g)= Y Mi Agi, 

i = l 

where and M. are, respectively, the infimum and supremum of f(x) on 
Ag,=g(x,)-g(x,_i), / = 1,2, ...,/r. If for a given 6>0 there 
exists a 6 > 0 such that 


US,if,g)-LS,if,g)<e (6.43) 

whenever < ô, where A^ is the norm of P, then f(x) is said to be 
Riemann-Stieltjes intégrable with respect to g(x) on [a, b]. In this case, 

(V(^) dg{x) = inf USp{f, g) = supL5p(/,g). 

P P 

Condition (6.43) is both necessary and sufficient for the existence of the 
Riemann-Stieltjes intégral. 

Equivalently, suppose that for a given partition P = {xq, x^, . . . , we 
define the sum 

S{P,f,g)=Yf{ti)Agi, (6.44) 

i = l 

where t- is a point in the interval [x,_^,xj, / = 1,2, . . . , n. Then /(x) is 
Riemann-Stieltjes intégrable with respect to g(x) on [a, b] if for any e>0 



RIEMANN-STIELTJES INTEGRAL 


235 


there exists a 5 > 0 such that 


S{PJ,g)- fy{x)dg{x) 


< € 


(6.45) 


for any partition P of [a, b] with a norm Ap<S, and for any choice of the 
point in / = 1,2, . . . , n. 

Theorems concerning the Riemann-Stieltjes intégral are very similar to 
those seen earlier concerning the Riemann intégral. In particular, we hâve 
the following theorems: 

Theorem 6.8.1. If f(x) is continuons on [a, b], then f(x) is 
Riemann-Stieltjes intégrable on [a, b]. 

Proof. See Exercise 6.21. □ 

Theorem 6.8.2. If f{x) is monotone increasing (or monotone decreasing) 
on [a,b\ and g(x) is continuons on {a,b\ then /(x) is Riemann-Stieltjes 
intégrable with respect to g(x) on [a, b\ 

Proof. See Exercise 6.22. □ 

The next theorem shows that under certain conditions, the Riemann- 
Stieltjes intégral reduces to the Riemann intégral. 

Theorem 6.8.3. Suppose that f(x) is Riemann-Stieltjes intégrable with 
respect to g(x) on [a,b\ where g(x) has a continuons dérivative g'(x) on 
[a, b]. Then 


dg{x) = j'y(x)g'(x) dx. 


Proof Let P = {xq, . . . , x„} be a partition of [a, b\ Consider the sum 


n 


S{P,h)= Y.h{t,)^x,, 

i = l 


(6.46) 


where h{x) =f(x)g'(x) and x^_i < <x,, / = 1,2, . . . , n. Let us also consider 

the sum 


siP,f,g) = Y.màg^, 

/ = ! 


(6.47) 



236 


INTEGRATION 


If we apply the mean value theorem (Theorem 4.2.2) to g(x), we obtain 

^8i=g{^i) =g'{Zi) Ax,, / = 1,2,...,7 î, (6.48) 

where <z^<Xp i= 1, 2, ...,tî. From (6.46), (6.47), and (6.48) we can 
then Write 


SiP,f,g)-S{P,h) = Lf{ti)[ 8 'iZi) -^'(O] Ax,. (6.49) 

i = l 

Since f(x) is bounded on [a, b] and g'(x) is uniformly continuons on [a, b] by 
Theorem 3.4.6, then for a given c> 0 there exists a > 0, which dépends 
only on c, such that 




2M{b-a) ’ 


if \zi — 1 ^\ < 8 ^, where M > 0 is such that |/(x)| < M on [a, b]. From (6.49) it 
follows that if the partition P has a norm Ap < Si, then 


\SiP,f,g)-SiP,h)\<-. 


(6.50) 


Now, since fix) is Riemann-Stieltjes intégrable with respect to g(x) on 
[a, b], then by définition, for the given e > 0 there exists a ^2 > 0 such that 


S{P,f,g)~ jy{x)dg{x) 



(6.51) 


if the norm A„ of P is less than 8^. We conclude from (6.50) and (6.51) that 

P ^ 

if the norm of P is less than min(5p 82 ), then 


S{P,h) - jy{x)dg{x) 


< €. 


Since e is arbitrary, this inequality implies that faf(^)dg(x) is in fact the 
Riemann intégral fyh(x)dx = f^f(x)g'(x)dx. □ 

Using Theorem 6.8.3, it is easy to see that if, for example, fix) = 1 and 
g(x)=x^, then faf(^)dg(x) = fff(x)gXx)dx = jf 2 xdx = b^ — a^. 

It should be noted that Theorems 6.8.1 and 6.8.2 provide sufficient 
conditions for the existence of jffix)dgix). It is possible, however, for the 
Riemann-Stieltjes intégral to exist even if g(x) is a discontinuons fonction. 
For example, consider the fonction g(x) = yl{x — c), where y is a nonzero 



RIEMANN-STIELTJES INTEGRAL 


237 


constant, a <c <b, and I{x — c) is such that 


I{x — c) 


0, x<c, 

1, x>c. 


The quantity y represents what is called a jump at x = c. If /(x) is bounded 
on [a, h] and is continuons at x = c, then 


Cf{x)dg{x) = yf{c). (6.52) 

a 

To show the validity of formula (6.52), let P = {xq, x^, . . . , x„} be any partition 
of [a, b]. Then, =g(Xj) — g(x,_i) will be zéro as long as x^ <c or 
Xj_i>c. Suppose, therefore, that there exists a k, l<k<n, such that 
x^_i <c<Xj^. In this case, the sum S(P,f,g) in formula (6.44) takes the 
form 


S(P,f,8) 




i = l 


It follows that 


g) - yfic) I = \y\\fitk) (6.53) 

Now, let c > 0 be given. Since /(x) is continuons at x = c, then there exists a 
ô > 0 such that 


\f{tk)-fic)\< 


y\ ’ 


if — c| < ô. Thus if the norm of P is chosen so that A„ < ô, then 


\S{P,f,g) -y/(c)|<e. 

Equality (6.52) follows from comparing inequalities (6.45) and (6.54). 
It is now easy to show that if 


(6.54) 


g(x) 


A, a <x <b, 

A', x = b, 


and if /(x) is continuons at x = b, then 


’’fix)dg{x) =(A'-A)/(6). 


(6.55) 


The previous examples represent spécial cases of a class of functions 
defined on [a, b] called step functions. These functions are constant on [a, b] 



238 


INTEGRATION 


except for a finite number of jump discontinuities. We can generalize 
formula (6.55) to this class of functions as can be seen in the next theorem. 

Theorem 6.8.4. Let g(x) be a step function defined on [a, b] with jump 
discontinuities at x = ^ 2 , . . . , and a <c^<C 2 < = b, such that 


g{x) = 


, a < X 
A2 , < X < C2 , 


? 

A 


<x<c„, 


« + 1 ’ 


x = c„. 


If /(x) is bounded on [a, b] and is continuons at x = c^, C 2 , . . . , c„, then 


n 


/ f{x)dg{x)= E (A ;+1 - A,)/(c,.). 

^ i=l 


(6.56) 


Proof The proof can be easily obtained by first writing the intégral in 
formula (6.56) as 

I /(x) dg{x) = f V(x) dg{x) + f y{x) dg{x) 

a "'a 


+ ■••+/" f{x) dg{x). 


(6.57) 


n-1 


If we now apply formula (6.55) to each intégral in (6.57) we obtain 


^ V(x)^/g(x) =(À2 -Ai)/(Ci), 


/ ^f{x) dg{x) = {k^ - \2)f{C2)^ 


f " fix)dg{x) = {\„^-^- k„)f{cj. 

C„_i 

By adding up ail these intégrais we obtain formula (6.56). □ 

Example 6.8.1. One example of a step function is the greatest-integer 
function [x], which is defined as the greatest integer less than or equal to x. 
If /(x) is bounded on [0, n] and is continuons at x = 1, 2, . . . , n, where n is a 



APPLICATIONS IN STATISTICS 


239 


positive integer, then by Theorem 6.8.4 we can write 

rf(x)d[x]= (6.58) 

■>0 , = 1 

It follows that every finite sum of the form can be expressed as a 

Riemann-Stieltjes intégral with respect to [x] of a function f(x) continuons 
on [0,n] such that f(i) = ai, / = 1,2, . . . , n. The Riemann-Stieltjes intégral 
has therefore the distinct advantage of making finite sums expressible as 
intégrais. 


6.9. APPLICATIONS IN STATISTICS 

Riemann intégration plays an important rôle in statistical distribution theory. 
Perhaps the most prévalent use of the Riemann intégral is in the study of the 
distributions of continuons random variables. 

We recall from Section 4.5.1 that a continuons random variable X with a 
cumulative distribution function F{x) = P(X <x) is absolutely continuons if 
F(x) is différentiable. In this case, there exists a function f(x) called the 
density function of X such that F'(x) =f(x), that is. 


— CO 


(6.59) 


In general, if X is a continuons random variable, it need not be absolutely 
continuons. It is true, however, that most common distributions that are 
continuons are also absolutely continuons. 

The probability distribution of an absolutely continuons random variable is 
completely determined by its density function. For example, from (6.59) it 
follows that 


P{a <X<b) =F{h) -F{a) = f^f(x)dx. (6.60) 

a 

Note that the value of this probability remains unchanged if one or both of 
the end points of the interval [a, h] are included. This is true because the 
probability assigned to these individual points is zéro when X has a continu- 
ons distribution. The mean /x and variance oi X are given by 

/ji = E(X)=f xf(x)dx, 

— 00 

/ CO -, 

(x — /x) f(x) dx. 

— 00 



240 


INTEGRATION 


In general, the Æth central moment of X, denoted by (Æ = 1, 2, . . . ), is 
defined as 




= E\{X—fjiŸ =f (x — fjLŸf(x) dx. 

— 00 


(6.61) 


We note that = /jl 2 . Similarly, the kth noncentral moment of X, denoted 
by jj!f^ (Æ = 1, 2, . . . ), is defined as 

fji^k=E{X^). (6.62) 

The first noncentral moment of X is its mean /x, while its first central 
moment is equal to zéro. 

We note that if the domain of the density function f(x) is infinité, then 
and fjbj^ are improper intégrais of the first kind. Therefore, they may or may 
not exist. If fZœ\x\^f(x)dx exists, then so does /x'^ (see Définition 6.5.2 
concerning absolute convergence of improper intégrais). The latter intégral is 
called the kth absolute moment and is denoted by If exists, then the 
noncentral moments of order j for j <k exist also. This follows because of 
the inequality 



/C I ^ 

< X + 1 


if j <k, 


(6.63) 


which is true because |xK < \x\ if |x| > 1 and |xp < 1 if |x| < 1. Hence, 
|x^| < |x|^ + 1 for ail X. Consequently, from (6.63) we obtain the inequality 


j<k, 


which implies that /x) exists for j < k. Since the central moment /jlj in 
formula (6.61) is expressible in terms of noncentral moments of order j or 
smaller, the existence of Pj also implies the existence of /Xy. 

Example 6.9.1. Consider a random variable X with the density function 


/(^) 


1 

7t(1 +X^) 


— CO <x < 00. 


Such a random variable has the so-called Cauchy distribution. Its mean /x 
does not exist. This follows from the fact that in order of /x to exist, the two 
limits in the following formula must exist: 

I y-o xdx I y-è xdx 

/X = — lim / ^ H lim / ^ . 

TT a^oo J \ X 7T6^co^q 1+X 


(6.64) 



APPLICATIONS IN STATISTICS 


241 


But, f^^xdx/(l +x^) = — flog(l + a^) as a ^ co^ and fQxdx/(l +x^) 

= |log(l + ^ 00 as The intégral +x^) is therefore 

divergent, and hence /x does not exist. 

It should be noted that it would be incorrect to State that 


1 .a xdx 
/X = - lim / , 

TT J -a 1 +X 


(6.65) 


which is equal to zéro. This is because the limits in (6.64) must exist for any a 
and b that tend to infinity. The limit in formula (6.65) requires that a =b. 
Such a limit is therefore considered as a subsequential limit. 

The higher-order moments of the Cauchy distribution do not exist either. 
It is easy to verify that 



( 6 . 66 ) 


is divergent for k> 1. 

Example 6.9.2. Consider a random variable X that has the logistic 
distribution with the density fonction 


7, -œ<x<oo. 

(1 + 0 



The mean of X is 




(ï+ô^ 




(6.67) 


where u = e^/(l e^). We recognize the intégral in (6.67) as being an 
improper intégral of the second kind with singularities at w = 0 and u = 1. 
We therefore write 





( 6 . 68 ) 



242 


INTEGRATION 


Thus, 


/x= lim [ulogu (1 — u)\og(l — u)Ÿ/^ 

+ lim [m log M + (1 — w)log(l — m)]i /2 


= lim [è log è + (1 — ^)log(l — ^)] 

b^l~ 

— lim [a log a + (1 — a)log(l — a)] . (6.69) 


By applying rHospitaPs rule (Theorem 4.2.6) we find that 


lim (l—b)\og(l—b)= lim a\oga = Q, 


We thus hâve 


/X = lim (blogb)— lim [(1 — a)log(l — a)] = 0. 

b^l~ 


The variance of X can be shown to be equal to tt^/ 3 (see Exercise 6.24). 


6.9.1. The Existence of the First Négative Moment 
of a Continuons Distribution 

Let X be a continuous random variable with a density function f(x). By 
définition, the first négative moment of X is E(X~^). The existence of such 
a moment will be explored in this section. 

The need to evaluate a first négative moment can arise in many practical 
applications. Here are some examples. 

Example 6.9.3. Let ^ be a population with a mean /x and a variance 
The coefficient of variation is a measure of variation in the population 
per unit mean and is equal to o-/ |/x|, assuming that /x ^ 0. An estimate of 
this ratio is s/\ÿ\, where ^ and ÿ are, respectively, the sample standard 
déviation and sample mean of a sample randomly chosen from <^. If the 
population is normally distributed, then ÿ is also normally distributed and is 
statistically independent of In this case, E(^/ |ÿ| ) =£'(^)£'(I/ |ÿ|). The 
question now is whether E{1/ |ÿ|) exists or not. 

Example 6.9.4 (Calibration or Inverse Régression). Consider the simple 
linear régression model 


E(y) = (3o + PiX. 


(6.70) 


In most régression situations, the interest is in predicting the response y for 



APPLICATIONS IN STATISTICS 


243 


a given value of x. For this purpose we use the prédiction équation y = /3 q + 

A AA 

PiX, where /3 q and are the least-squares estimators of (Bq and fB^, 
respectively. These are obtained from the data set {(x^, y^), (% 2 , ^ 2 ^ • • • ? 
(x„, y„)} that results from running n experiments in which y is measured for 
specified settings of x. There are other situations, however, where the 
interest is in predicting the value of x, say Xq, that corresponds to an 
observed value of y, say yg- This is an inverse régression problem known as 
the calibration problem (see Graybill, 1976, Section 8.5; Montgomery and 
Peck, 1982, Section 9.7). 

For example, in calibrating a new type of thermometer, n readings, 
y^y 2 ,...,y„, are taken at predetermined known température values, 
x^, X 2 ,...,x„ (these values are known by using a standard température 
gauge). Suppose that the relationship between the x/s and the y/s is well 
represented by the model in (6.70). If a new reading yg is observed using the 
new thermometer, then it is of interest to estimate the correct température 
Xg (that is, the température on the standard gauge corresponding to the 
observed température reading yg). 

In another calibration problem, the date of delivery of a prégnant woman 
can be estimated by the size y of the head of her unborn child, which can be 
determined by a spécial electronic device (sonogram). If the relationship 
between y and the number of days x left until delivery is well represented by 
model (6.70), then for a measured value of y, say yg, it is possible to estimate 
Xg, the corresponding value of x. 

In general, from model (6.70) we hâve £'(yg) = /3g + /B^Xg. If (B^ ^ 0, we 
can solve for Xg and obtain 

^(yo) -/3o 


Hence, to estimate Xg we use 

. yo-/3o _ yo-ÿ 

Xg = ^ = X H X , 

/3i Pi 


since /3g=y — /3^x, where x = (l//r)E”=iX^, y = (l//r)E”=iy,. response 

y is normally distributed with a variance a \ then y and (B. are statistically 

A 

independent. Since yg is also statistically independent of fB^ (yg does not 
belong to the data set used to estimate fB^), then the expected value of Xg is 
given by 


£’(xg) =x-\-E 


yo-y ^ 



/ 


= x + E{yo-y)E 


' 1 ' 
Œl 


Here again it is of interest to know if exists. 



244 


INTEGRATION 


Now, suppose that the density function f(x) of the continuous random 
variable X is defined on (0, oo). Let us also assume that f(x) is continuous. 
The expected value of X~^ is 



(6.71) 


This is an improper intégral with a singularity at x = 0. In particular, if 
/(O) > 0, then E(X 0 does not exist, because 


lim 


/(^)A 

l/x 


=/( 0 )> 0 . 


By Theorem 6.5.7, the intégrais /o(/(x)/x)rfr and /^(dx/x) are of the same 
kind. Since the latter is divergent, then so is the former. Note that if fix) is 
defined on ( — œ^œ) and /(O) > 0, then E(X~^) does not exist either. In this 
case. 




Both intégrais on the right-hand side are divergent. 

A sufficient condition for the existence of E(X~^) is given by the following 
theorem [see Piegorsch and Casella (1985)]: 

Theorem 6.9.1. Let f(x) be a continuous density function for a random 
variable X defined on (0, ^). If 

/(^) 

lim < 00 for some a > 0 


(6.72) 


then E(X ^) exists. 

Proof. Since the limit of /(x)/x“ is finite as x^O"^, there exist finite 
constants M and ô > 0 such that /(x)/x“ < M if 0 <x < 5. Hence, 



a 



APPLICATIONS IN STATISTICS 


245 


Thus, 



Mô" 1 .00 1 

< + - f(x)dx< + T<°°- ° 

a O ad 

It should be noted that the condition of Theorem 6.9.1 is not a necessary 
one. Piegorsch and Casella (1985) give an example of a family of density 
functions that ail violate condition (6.72), with some members having a finite 
first négative moment and others not having one (see Exercise 6.25). 

Corollary 6.9.1. Let f(x) be a continuons density function for a random 
variable X defined on (0, such that /(O) = 0. If /'(O) exists and is finite, 
then E(X~^) exists. 

Proof We hâve that 


f(x) -/(O) f{x 

lim = lim 

By applying Theorem 6.9.1 with a = 1 we conclude that E(X~^) exists. □ 

Example 6.9.5. Let X be a normal random variable with a mean /x and 
a variance Its density function is given by 



/(^) 



1 

2a^ 


(x- /J.) 



— CO <X < 00. 


In this example, /(O) > 0. Hence, E(X~^) does not exist. Consequently, 
E(l/\ÿ\) in Example 6.9.3 does not exist if the population ^ is normally 
distributed, since the density function of |ÿ| is positive at zéro. Also, in 
Example 6.9.4, E(l/p^) does not exist, because is normally distributed if 
the response y satisfies the assumption of normality. 

Example 6.9.6. Let X be a continuons random variable with the density 
function 


1 

r(/r/2)2”/^ 


g -^/2 


? 


f(x) = 


0 <X < 00, 



246 


INTEGRATION 


where n is a positive integer and T{n/i) is the value of the gamma function, 

dx. This is the density function of a chi-squared random variable 
with n degrees of freedom. 

Let us consider the limit 


lim 


f(x) 


1 

r(n/2)2"/2 


lim ^ 


g -^/2 


for û: > 0. This limit exists and is equal to zéro if 7 î/ 2 — a — 1 > 0, that is, if 
n > 2(1 + a) > 2. Thus by Theorem 6.9.1, E(X~^) exists if the number of 
degrees of freedom exceeds 2. 

More recently, Khuri and Casella (2002) presented several extensions and 
generalizations of the results in Piegorsch and Casella (1985), including a 
necessary and sufficient condition for the existence of E(X~^). 


6.9.2. Transformation of Continuons Random Variables 

Let y be a continuons random variable with a density function /(y). Let 
W = i//(y), where is a function whose dérivative exists and is continuons 
on a set A. Suppose that for ail y^A, that is, ip(y) is strictly 

monotone. We recall from Section 4.5.1 that the density function g(w) of W 
is given by 


g{w) =f[>p \w)_ 


dip ^(w) 

dw 


w 


(6.73) 


where B is the image of A under if/ and y = is the inverse function 

of w = t/^(y). This resuit can be easily obtained by applying the change of 
variables technique in Section 6.4.1. This is donc as follows: We hâve that for 
any and W 2 such that < W 2 , 


r^2 

P(wi<W<W2)= g(w)dw. (6-74) 

If ^'(y) > 0, then ipiy) is strictly monotone increasing. Hence, 

P(w, < W< W 2 ) =P(yi < Y<y2), (6.75) 

where y^ and y 2 are such that = i//(y^), W 2 ^ 

P{y^<Y<y^)= Pf{y)dy. (6.76) 

yi 


Let us now apply the change of variables y = Hw) to the intégral in (6.76). 



APPLICATIONS IN STATISTICS 


247 


By Theorem 6.4.9 we hâve 


Pf{y)dy = j '/[lA ^(w)] 


dij/ ^(w) 
dw 


dw. 


(6.77) 


From (6.75) and (6.76) we obtain 


F(Wi <W<W2) = f V[*A ^(w')] 

*^VWi 


d\p (w) 
dw 


dw. 


(6.78) 


On the other hand, if ij/'(y)<0, then P{wi<W <W 2 ) = P{y 2 ^^ 
Consequently, 


P(wj <W<W 2 )= pf(y) dy 

yi 




dw 


r *^2 r 1 1 

-/ /[<A (W')] : dw. 


dw 


By combining (6.78) and (6.79) we obtain 


r^2 

P(Wi<W<W2) = f f[>P~\w)] 


dij/ ^(w) 
dw 


dw. 


(6.79) 


(6.80) 


Formula (6.73) now follows from comparing (6.74) and (6.80). 


The Case Where w = ifj (y) Has No Unique Inverse 

Formula (6.73) requires that w = i^iy) has a unique inverse, the existence of 
which is guaranteed by the nonvanishing of the dérivative */^'(y). Let us now 
consider the following extension: The function if/iy) is continuously différen- 
tiable, but its dérivative can vanish at a finite number of points inside its 
domain. We assume that the domain of ij/iy) can be partitioned into a finite 
number, say n, of disjoint subdomains, denoted by 7^, / 2 , . . . , on each of 
which ihiy) is strictly monotone (decreasing or increasing). Hence, on each 7^ 
(/ = 1, 2, . . . , n), i^iy) has a unique inverse. Let dénoté the restriction of 
the function ij/ to 7., that is, t/^,(y) has a unique inverse, y=ilj^^(w), 
/= 1,2, ...,7î. Since 7^72,...,7„ are disjoint, for any w^ and W 2 such that 
Wi < W 2 we hâve 

n 

P{W,<W<W2) = ZP[y^^7\w„W2)\, 

i=\ 


(6.81) 



248 


INTEGRATION 


where inverse image of [^ 1 ,^ 2 ]’ which is a subset of 

/ = 1, 2, . . . , 7î. Now, on the ith subdomain we hâve 


p[ye (/f,. ^>^1,^2)] =/ , fiy) 


dy 




d^i \w) 

dw 


dw, / = 1,2, . . . , /î, 


(6.82) 


where is the image of ijj[^(wi,W 2 ) under Formula (6.82) follows from 
applying formula (6.80) to the function i///(y), / = 1,2, . . . , n. Note that T- = 
W 2 )] ^ i = l,2,...,/î (why?). Thus T- is a subset 

of both [^ 1 ,^ 2 ] We can therefore write the intégral in (6.82) as 



d<f)j \w) 
dw 



^H>2 r -I 


dt \w) 

dw 



(6.83) 


where ô,(w) = 1 if w e and 5^(w) = 0 otherwise, / = 1, 2, . . . , n. Using 
(6.82) and (6.83) in formula (6.81), we obtain 


P{w^<W<W2)= E / "S;(w)/[tAi i(w)] 


/ = ! 




^^1 i=l 


d>Pi (w) 

dw 


dfl^i ^(w) 
dw 


dw 


dw, 


from which we deduce that the density function of W is given by 


n 


g(w) = E 5,(w)/[iAi 


i=l 


d<Pi (w) 
dw 


(6.84) 


Example 6.9.7. Let Y hâve the standard normal distribution with the 
density function 



— œ Cy < 00. 


Define the random variable W as W=Y^. In this case, the function w =y^ 



APPLICATIONS IN STATISTICS 


249 


has two inverse functions on ( — œ)^ namely, 

-Vvv, y<0, 

^^w , y > 0. 



Thus /^ = ( - œ, 0], I 2 = (0, co), and ipiUi) = [0, œ), ijj 2 (l 2 ) = (0, “). Hence, 
ôi(vv) = 1, Ô 2 (w) = 1 if w e (0, ^). By applying formula (6.84) we then get 


1 

-I 

1 

I 


2\/vv 

+ , 

V2t7 

2\/vv 


1 


^fw 


VP > 0. 


This represents the density function of a chi-squared random variable with 
one degree of freedom. 


6.9.3. The Riemann-Stieltjes Intégral Représentation of the Expected Value 

Let X be a random variable with a cumulative distribution function F(x). 
Suppose that h{x) is Riemann-Stieltjes intégrable with respect to F{x) on 
( — CO, co). Then the expected value of h{X) is defined as 


/ CO 

h{x) dF{x). 

— 00 


(6.85) 


Formula (6.85) provides a unified représentation of expected values for both 
discrète and continuons random variables. 

If X is a continuons random variable with a density function /(x), that is, 
F'(x) =/(x), then 


/ CO 

h{x)f{x) dx. 

— CO 


If, however, X has a discrète distribution with a probability mass function 
p{x) and takes the values c^, C 2 , . . . , c„, then 


n 


E[h{X)\ = L*(c,)p(c,). 

i = \ 


( 6 . 86 ) 


Formula (6.86) follows from applying Theorem 6.8.4. Here, F{x) is a step 



250 


INTEGRATION 


function with jump discontinuities at c^, C 2 , . . . , such that 


F{x) = l 


0 , — 

P{Cx), Cj<X<C2, 

p(Ci)+p(C2), C2<X<C3, 


C „_1 <x<c„, 


n — 1 

E p(Ci), 

i = l 
n 

Ep(c,) = 1, C„<X<œ. 

/ = 1 


Thus, by formula (6.56) we obtain 


-CO ^ 

I /î(x)^/F(x) = Ep(C;)/î(c,). 

i = l 


(6.87) 


For example, suppose that X has the discrète uniform distribution with 
p{x)=l/n for X = C 2 , . . . , Its cumulative distribution function F{c) 

can be expressed as 


1 ^ 

F(x) =P[X<x] = - L /(x-c,), 

^ i=\ 

where I{x — c,) is equal to zéro if x < c, and is equal to one if x > 

(/ = 1, 2, . . . , 7î). The expected value of h{X) is 

E[h{X)\=-^h{c,). 

^ i = l 

Example 6.9.8. The moment generating function of a random 

variable X with a cumulative distribution function F(x) is defined as the 
expected value of h^iX) = that is, 

/ CO 

e‘^dF{x), 

— 00 

where t is a scalar. If X is a discrète random variable with a probability mass 
function p{x) and takes the values c^, C 2 , . . . , c„, . . . , then by letting n go to 
infinity in (6.87) we obtain (see also Section 5.6.2) 


CO 

HO = Ep(c,)e'"‘. 


i=l 



APPLICATIONS IN STATISTICS 


251 


The moment generating function of a continuons random variable with a 
density function f(x) is 


-CO 

= I e^^f(x)dx. ( 6 . 88 ) 

The convergence of the intégral in (6.88) dépends on the choice of the scalar 
L For example, for the gamma distribution G(a, p) with the density function 


f(x) 


r(a)/3“ 


a>0, /3>0, 0<x<oo, 


c^(0 is of the form 


H0=f 

•'O 




T{a)l3^ 


dx 


-( 


^ exp[-x(l - /3^)/)S] 


T(a)/3“ 


dx. 


If we set y =x(l — pt)/ (5, we obtain 


H0=f 

Jr\ 


r ^ 1 

Py 

Jo (l-/30r(«)|8“l 

\l-l3t) 


a-1 


e ^ dy. 


Thus, 


^(0 = 


1 


Jn 


a— 1 - —y 
ooy e ^ 


(l-pt) Jq T(a) 
= {1-I3t) 


dy 


— a 


since j^e ^ dy = T(a)by the définition of the gamma function. We note 
that <p(t) exists for ail values of a provided that 1 — fSt > 0, that is, t < 1/13. 


6.9.4. Chebyshev’s Inequality 

In Section 5.6.1 there was a mention of Chebyshev’s inequality. Using the 
Riemann-Stieltjes intégral représentation of the expected value, it is now 
possible to provide a proof for this important inequality. 

Theorem 6.9.2. Let X be a random variable (discrète or continuons) with 
a mean jjl and a variance Then, for any positive constant r, 

I 

P(|X- fi\ >ra) < 



252 


INTEGRATION 


Proof By définition, is the expected value of h{X) = {X— Thus, 


(T 


2 



(x — /x) dF{x), 


(6.89) 


where F{x) is the cumulative distribution function of X. Let us now partition 
( — 00, oo) into three disjoint intervals: ( — oo, ^ — ro- ], ( /x — ro- , /x + ro- ), 
[ /X + ro-, oo). The intégral in (6.89) can therefore be written as 


O* 


2 


/ fjL — rcr -, ^fjL + r(7 -, 

(x— /x) dF(x) + I (x — /x) dF(x) 


f 


00 


+ I (x — /x) dF(x) 

fjL + rcr 


> 


rfJL-ra- -00 

I (x— /x) dF(x) + I (x — /x) dF(x), (6.90) 

— CO '^/x + r(T 


We note that in the first intégral in (6.90), x < /x — ro-, so that x — /x < — ro-. 
Hence, (x — /x)^>rV^. Also, in the second intégral, x — /x>ro-. Hence, 
(x — /x)^ > rV^. Consequently, 


— f (T -, 

(x— /x) iiF(x)>rV^/ (iF(x) = rV^P(X < /X — ro-) , 

- CO — CO 


[x—ra 


^CO CO 

j (x— /x) iiP(x)>rV^j <iP(x) = rV^P(X> /X + ro-) . 


^t + ro- 


/Lt+/*o- 


From inequality (6.90) we then hâve 

a^>r^a^[P{X-^,< — ro-) + P(X— /x>ro-)] 
= rV^P(|X— /x| >ro-), 

which implies that 


1 

P(|X— /x| > ro-) < — . □ 

FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

DeCani, J. S., and R. A. Stine (1986). “A note on deriving the information matrix for 
a logistic distribution.” Amer. Statist., 40, 220-222. (This article uses calculus 
techniques, such as intégration and THospitaTsrule, in determining the mean and 
variance of the logistic distribution as was seen in Example 6.9.2.) 

Fulks, W. (1978). Advanced Calculus, 3rd ed. Wiley, New York. (Chap. 5 discusses the 
Riemann intégral; Chap. 16 provides a study of improper intégrais.) 



EXERCISES 


253 


Graybill, F. A. (1976). Theory and Application of the Linear Model. Duxbury Press, 
North Scituate, Massachusetts. (Section 8.5 discusses the calibration problem for 
a simple linear régression model as was seen in Example 6.9.4.) 

Hardy, G. H., J. E. Littlewood, and G. Pôlya (1952). Inequalities, 2nd ed. Cambridge 
University Press, Cambridge, England. (This is a classic and often referenced 
book on inequalities. Chap. 6 is relevant to the présent chapter.) 

Hartig, D. (1991). “L’HopitaFs rule via intégration.” Amer. Math. Monthly, 98, 
156-157. 

Khuri, A. L, and G. Casella (2002). “The existence of the first négative moment 
revisited.” Amer. Statist., 56, 44-47. (This article demonstrates the utility of the 
comparison test given in Theorem 6.5.3 in showing the existence of the first 
négative moment of a continuons random variable.) 

Lindgren, B. W. (1976). Statistical Theory, 3rd ed. Macmillan, New York. (Section 
2.2.2 gives the Riemann-Stieltjes intégral représentation of the expected value of 
a random variable as was seen in Section 6.9.3.) 

Montgomery, D. C., and E. A. Peck (1982). Introduction to Linear Régression Analysis. 
Wiley, New York. (The calibration problem for a simple linear régression model 
is discussed in Section 9.7.) 

Moran, P. A. P. (1968). An Introduction to Probability Theory. Clarendon Press, 
Oxford, England. (Section 5.9 defines moments of a random variable using the 
Riemann-Stieltjes intégral représentation of the expected value; Section 5.10 
discusses a number of inequalities pertaining to these moments.) 

Piegorsch, W. W., and G. Casella (1985). “The existence of the first négative 
moment.” Amer. Statist., 39, 60-62. (This article gives a sufficient condition for 
the existence of the first négative moment of a continuons random variable as 
was seen in Section 6.9.1.) 

Roussas, G. G. (1973). A First Course in Mathematical Statistics. Addison-Wesley, 
Reading, Massachusetts. (Chap. 9 is concerned with transformations of continu- 
ons random variables as was seen in Section 6.9.2. See, in particular, Theorems 2 
and 3 in this chapter.) 

Taylor, A. E., and W. R. Mann (1972). Advanced Calculas, 2nd ed. Wiley, New York. 
(Chap. 18 discusses the Riemann intégral as well as the Riemann-Stieltjes 
intégral; improper intégrais are studied in Chap. 22.) 

Wilks, S. S. (1962). Mathematical Statistics. Wiley, New York. (Chap. 3 uses the 
Riemann-Stieltjes intégral to define expected values and moments of random 
variables; functions of random variables are discussed in Section 2.8.) 


EXERCISES 
In Mathematics 

6.1. Let /(x) be a bounded function defined on the interval {a, h]. Let P be 
a partition of [a, h]. Show that f{x) is Riemann intégrable on [a, h] if 
and only if 

miUSp(f ) = supL5p(/) = f f(x) dx. 

P P 



254 


INTEGRATION 


6 . 2 . Construct a function that has a countable number of discontinuities in 
[0, 1] and is Riemann intégrable on [0, 1]. 

6 . 3 . Show that if fix) is continuons on [a, b] except for a finite number of 
discontinuities of the first kind (see Définition 3.4.2), then f(x) is 
Riemann intégrable on [a, b]. 

6 . 4 . Show that the function 


f(x) 


xcos(7t/2x), 0 <x < 1, 

0, X = 0, 


is not of bounded variation on [0, 1]. 


6.5. Let f(x) and g(x) hâve continuons dérivatives with g'(x) > 0. Suppose 
that lim^^^/(x) = 00 , g(x) = and f'(x)/g'(x) = L, 

where L is finite. 

(a) Show that for a given 6 > 0 there exists a constant M > 0 such that 
for X > M, 


\f'{x)-Lg'{x)\<eg'{x). 
Hence, if and A 2 are such that M< A^ < A 2 , then 


-Lg'{x)]dx < j^^eg'(x) ck. 


(b) Deduce from (a) that 


/(A 2 ) 


|/(Ai)| g(AQ 
^(A2) 'g(A2)- 


(c) Make use of (b) to show that for a sufficiently large A 2 , 


/(A2) 

^(A2) 


< 3e, 


and hence lim^^^ /(x)/g(x) = L. 

[Note: This problem vérifiés rHospitaPs rule for the 00/00 indeterminate 
form by using intégration properties without relying on Cauchy’s mean 
value theorem as in Section 4.2 (see Hartig, 1991)]. 


6.6. Show that if /(x) is continuons on [a, b], and if g(x) is a nonnegative 
Riemann intégrable function on [a, b] such that f^g(x)dx> 0, then 



EXERCISES 


255 


there exists a constant c, a <c <b, such that 


iaf{x)g{x)dx 

fag(x)dx 



6.7. Prove Theorem 6.5.2. 

6.8. Prove Theorem 6.5.3. 

6.9. Suppose that f(x) is a positive monotone decreasing function defined 
on [a,oo) such that f(x) ^ 0 as x ^ Show that if f(x) is 
Riemann-Stieltjes intégrable with respect to g(x) on [a, b] for every 
b >a, where g(x) is bounded on [a, then the intégral /“/(x) dg(x) is 
convergent. 

6.10. Show that fQ^\(sin x)/x\dx = where n is a positive integer. 

[Hint: Show first that 



1 

X + TT 


1 


+ ••• + 


X {n — 1)tt 



6.11. Apply Maclaurin’s intégral test to détermine convergence or divergence 
of the following sériés: 


(a) 

” logn 

(b) 

" tî + 4 

«Çi2n^+l’ 

(c) 



6.12. Consider the sequence {/„(x)}“=i, where /„(x) = m:/(l + m:^), x > 0. 
Find the limit of f^fj^x)dx as n ^ 

6.13. Consider the improper intégral dx. 



256 


INTEGRATION 


(a) Show that the intégral converges if m > 0, n > 0. In this case, the 

fonction defined as 

B{m,n) = ( x)” ^ dx 

h 

is called the beta fonction. 

(b) Show that 


B{m,n) =l( ^0cos^” "^OdO 

h 


(c) Show that 


•c» X 


m — 1 


B{m,n) = l — -nrTjrdx= f 

•'o (1+x) •'o 


00 X 


n — 1 


0 (1+x) 


m +n 


dx 


(d) Show that 


B(^m,n) = / 

Jc\ 


ix^-^ +x"“^ 


0 (1+x) 


m +n 


dx. 


6 . 14 . Déterminé whether each of the following intégrais is convergent or 
divergent: 


(a) 

(b) 

(c) 

(d) 


L 


CO 


0 \/l 


dx 
+ X' 
dx 


(1 


£ 


dx 


L 


0 (1 -X-") 
dx 


3^V3 


0 Vx (1 + 2x) 


6 . 15 . Let /i(x) and / 2 (x) be boonded on [a, h\ and g(x) be monotone 
increasing on {a, h]. If //x) and / 2 (x) are Riemann-Stieltjes intégrable 
with respect to g(x) on [a,h\ then show that f^(x)f 2 (x) is also 
Riemann-Stieltjes intégrable with respect to g(x) on [a, b]. 



EXERCISES 


257 


6.16. Let f(x) be a function whose first n dérivatives are continuous on 
[a, h\ and let 


hn{x) =f{b) -f(x) - (b -X)f'{x) - 


jb-xY 

(n-1)! 


y(n-l) 



Show that 


h„{a) = 


1 

(n-1)! 


/V- 

a 


xy-^f 


in) 


(x) dx 


and hence 


/(^) =/(«) + {h-a)f\a) + ••• + 

This represents Taylor’s expansion of f(x) around x = a (see Section 
4.3) with a remainder given by 

[Note: This form of Taylor ’s theorem has the advantage of providing an 
exact formula for R^, which does not involve an undetermined number 
as was seen in Section 4.3.] 

6.17. Suppose that f(x) is monotone and its dérivative f'(x) is Riemann 
intégrable on [a, b]. Let g(x) be continuous on [a, b]. Show that there 
exists a number c, a <c <b, such that 


fy{x)g{x)dx=f{a)j^g{x)dx+f{b)j%{x) dx. 


6.18. Deduce from Exercise 6.17 that for any b> a>Q, 


.b sin X 
/ dx 

y-* 


X 


4 

< — 
a 


6.19. Prove Lemma 6.7.1. 



258 


INTEGRATION 


6.20. Show that if /^Cx), / 2 (x), . . . , /„(x) are such that |/,(x)|^ is Riemann 
intégrable on [a, h] for / = 1, 2, . . . , n, where 1 <p then 


n 


rb\ „ 

/ ^ J i\^ ) 

« l/=l 


\P 


dx 


i/p 


n 


< E 

i=l L 


C\fi{x)t dx 

'^a 


1/P 


6.21. Prove Theorem 6.8.1. 

6.22. Prove Theorem 6.8.2. 


In Statistics 

6.23. Show that the intégral in formula (6.66) is divergent for Æ > 1. 

6.24. Consider the random variable X that has the logistic distribution 
described in Example 6.9.2. Show that Var(X) = t7^/3. 

6.25. Let {/„(x)}“=i be a family density functions defined by 



llog" 

/o^llog" t\ ~^dt ’ 


0 <x < A, 


where A e (0, 1). 

(a) Show that condition (6.72) of Theorem 6.9.1 is not satisfied by any 
/„(x), n>l. 

(b) Show that when n = 1, E{X~^) does not exist, where X is a 
random variable with the density function /^(x). 

(c) Show that for n > 1, E{X~^) exists, where X„ is a random variable 
with the density function f^ix). 

6.26. Let X be a random variable with a continuons density function /(x) on 
(0, oo). Suppose that /(x) is bounded near zéro. Then £'(X“") exits, 
where a e (0, 1). 

6.27. Let X be a random variable with a continuons density function /(x) on 

(0, oo). If lim^^ is equal to a positive constant k for some 

a > 0, then does not exist. 



EXERCISES 


259 


6.28. The random variable Y has the ^-distributions with n degrees of 
freedom. Its density function is given by 



— 00 <y < 00, 


where T{m) is the gamma function j^e ^ dx, m> 0. Find the 
density function of W= \ Y\. 


6.29. Let X be a random variable with a mean /x and a variance 

(a) Show that Chebyshev’s inequality can be expressed as 

P(|X-/x| 


where r is any positive constant. 

(b) Let be a sequence of independent and identically dis- 

tributed random variables. If the common mean and variance of 
the X/s are /x and respectively, then show that 

_ (T^ 

P(|X„-mI >r)<—, 

^ ^ nr 

where X„ = (1 /tî)E”=iX, and r is any positive constant. 

(c) Deduce from (b) that X„ converges in probability to /x as n ^ 
that is, for every e > 0, 

P(|X„ — /x| > c) ^ 0 as 7î ^ 00 . 


6.30. Let X be a random variable with a cumulative distribution function 
F{x). Let /x'^ be its Æth noncentral moment, 

/ CO 

x’^dF(x). 

— 00 

Let be the Æth absolute moment of X, 

i,^=E(\X\’^)= r \x\^dF{x). 

^ — CO 


, n. 


Suppose that exists for k= 1,2, 


» » » 



260 


INTEGRATION 


(a) Show that ^ Æ = 1, 2, . . . , n — 1. [Hint: For any u 

and V, 

0< r + dF{x) 

— 00 

= + 2uvvj^ + . ] 

(b) Deduce from (a) that 


r'i < vY'^ < 


< 


< U 


1/ n 


— n 



CHAPTER 7 


Multidimensional Calculas 


In the previous chapters we hâve mainly dealt with real-valued functions of a 
single variable x. In this chapter we extend the notions of limits, continuity, 
différentiation, and intégration to multivariable functions, that is, functions 
of several variables. These functions can be real-valued or possibly vector-val- 
ued. More specifically, if dénotés the /r-dimensional Euclidean space, 
> 1, then we shall in general consider functions defined on a set D <zR^ 
and hâve values in m > 1. Such functions are represented symbolically as 
f:D^ where for x = (x^, % 2 , . . . , x„Y e Z), 


f(x) = [/i(x),/ 2 (x),...,/„(x)]' 


and /,(x) is a real-valued function of x^, X 2 , . . . , x„ (/ = 1, 2, . . . , m). 

Even though the basic framework of the methodology in this chapter is 
general and applies in any number of dimensions, most of the examples are 
associated with two- or three-dimensional spaces. At this stage, it would be 
helpful to review the basic concepts given in Chapters 1 and 2. This can 
facilitate the understanding of the methodology and its development in a 
multidimensional environment. 


7.1. SOME BASIC DEFINITIONS 

Some of the concepts described in Chapter 1 pertained to one-dimensional 
Euclidean spaces. In this section we extend these concepts to higher-dimen- 
sional Euclidean spaces. 

Any point x in can be represented as a column vector of the form 
(x^, X 2 , . . . , x„)', where x^ is the ith element of x (/ = 1,2, . . . , n). The 
Euclidean norm of x was defined in Chapter 2 (see Définition 2.1.4) as 
\\x \\2 = ('L*l=ixfy^^. For simplicity we shall drop the subindex 2 and dénoté 
this norm by ||x||. 


261 



262 


MULTIDIMENSIONAL CALCULUS 


Let Xq ^ A neighborhood N^(xq) of Xq is a set of points in that lie 
within some distance, say r, from Xq, that is, 


Ki^o) = {xe7?"|||x-X(, 



If Xq is deleted from A^^(xq), we obtain the so-called deleted neighborhood of 
Xq, which we dénoté by A^/(xq). 

A point Xq in is a limit point of a set A <zR^ if every neighborhood of 
Xq contains an element x of A such that x^Xq, that is, every deleted 
neighborhood of Xq contains points of A. 

A set A <zR^ is closed if every limit point of A belongs to A. 

A point Xq in R^ is an interior of a set A if there exists an r > 0 such 
that A^(xq) cyl. 

A set yl is open if for every point x in A there exists a neighborhood 
N^{x) that is contained in A. Thus A is open if it consists entirely of interior 
points. 

A point P ^R^ is a boundary point of a set yl if every neighborhood 
of P contains points of A as well as points of A, the complément of A with 
respect to The set of ail boundary points of A is called its boundary and 
is denoted by Br{A). 

A set yl is bounded if there exists an r > 0 such that ||x|| < r for ail x 
in A. 

Let g: J^^R^ be a vector-valued function defined on the set of ail 
positive integers. Let g(0 = a„ i>l. Then {a,}“=i represents a sequence of 
points in By a subsequence of we mean a sequence {a^ }7=i such 

that k^<k 2 < ••• < kl < ••• and k^ > i for i > 1 (see Définition 5.1.1). 

A sequence converges to a point c ^R^ if for a given e > 0 there 

exists an integer N such that lla,.-c||< e whenever i > N. This is written 
symbolically as lim^^^a^ = c, or a, ^ c as / ^ 

A sequence {aj}7=i is bounded if there exists a number K>0 such that 
IlaJI < K for ail /. 


12, LIMITS OF A MULTIVARIABLE FUNCTION 

We recall from Chapter 3 that for a function of a single variable x, its limit at 
a point is considered when x approaches the point from two directions, left 
and right. Here, for a function of several variables, say X 2 , . . . , x„, its limit 
at a point a = « 2 ? • • • ? is considered when x = (x^, X 2 , . . . , x„)' ap- 

proaches a in any possible way. Thus when n> 1 there are infinitely many 
ways in which x can approach a. 

Définition 7.2.1. Let f:D^ R^, where D <zR^. Then f(x) is said to hâve 
a limit L = (L^ L 2 , . . . , as x approaches a, written symbolically as x ^ a, 
where a is a limit point of Z), if for a given e> 0 there exists a ô > 0 such 



LIMITS OF A MULTIVARIABLE FUNCTION 


263 


that ||f(x) — L|| < € for ail X in Z) n A^/(a), where A^/(a) is a deleted neighbor- 
hood of a of radius ô. If it exists, this limit is written symbolically as 
lim^j^a f(x) = L. □ 

Note that whenever a limit of f(x) exists, its value must be the same no 
matter how x approaches a. It is important here to understand the meaning 
of “x approaches a.”By this we do not necessarily mean that x moves along a 
straight line leading into a. Rather, we mean that x moves doser and doser 
to a along any curve that goes through a. 

Example 7.2.1. Consider the behavior of the function 




x\ 


xj 



as X = (xi, X 2 )' 0, where 0 = (0,0)'. This function is defined everywhere in 

except at 0. It is convenient here to represent the point x using polar 
coordinates, r and 0, such that = r cos X 2 = r sin r > 0, 0<0<2tt. 
We then hâve 


cos^ 0 — r^ sin^ 0 
0 + r^sin^ 0 

= r(cos^ 0 — sin^ 0). 


Since x ^ 0 if and only if r^O, lim^^ ^ /(x^, X 2 ) = 0 no matter how x 
approaches 0. 

Example 7.2.2. Consider the function 




X1X2 


xf +x^ 


2 


Using polar coordinates again, we obtain 

/(x^, X 2 ) = cos 0 sin 0, 


which dépends on 6, but not on r. Since 0 can hâve infinitely many values, 
f(xi, X 2 ) cannot be made close to any one constant L no matter how small r 
is. Thus the limit of this function does not exist as x ^ 0. 



264 


MULTIDIMENSIONAL CALCULUS 


Example 7.2.3. Let /(x^, X 2 ) be defined as 


f{Xi,X2) = 


Xijxi+xl) 

x|+ (x^+xi) 


2 • 


This function is defined everywhere in except at (0,0)'. On the line 
Xi = 0, /(O, X2) =xl/(xl +X 2 X which goes to zéro as X 2 ^ 0. When X 2 = 0, 
/(xi,0) = 0 for =5^0; hence, /(xi,0)^0 as ^ 0. Furthermore, for any 
other straight line X 2 = tx^ it 0) through the origin we hâve 


/(Xi,^Xi) 


/Xi(Xi + t^x^) 

t^xl + (xi + t^X^Ÿ 

tXi(l + t^) 


Xi =5^ 0, 


which has a limit equal to zéro as x^ ^ 0. We conclude that the limit of 
fixi, X 2 ) is zéro as x ^ 0 along any straight line through the origin. However, 
/(xi, X 2 ) does not hâve a limit as x^O. For example, along the circle 
X 2 =Xi +xf that passes through the origin. 





xJ +x| # 0. 


Hence, /(x^, X 2 ) ^ | # 0. 

This example demonstrates that a function may not hâve a limit as x ^ a 
even though its limit exists for approaches toward a along straight lines. 


7.3. CONTINUITY OF A MULTIVARIABLE FUNCTION 

The notion of continuity for a function of several variables is much the same 
as that for a function of a single variable. 

Définition 7.3.1, Let f: D where D and let si^D. Then f(x) 

is continuons at a if 


lim f(x) = f(a), 

x^a 


where x remains in D as it approaches a. This is équivalent to stating that for 
a given 6 > 0 there exits a ô > 0 such that 


||f(x) -f(a)| < 


for ail X e Z> n A^g(a). 



CONTINUITY OF A MULTIVARIABLE FUNCTION 


265 


If f(x) is continuous at every point x in Z), then it is said to be continuons 
in D. In particular, if f(x) is continuous in D and if d (in the définition of 
continuity) dépends only on e (that is, d is the same for ail points in D for 
the given e), then f(x) is said to be uniformly continuous in Z). □ 

We now présent several theorems that provide some important properties 
of multivariable continuous functions. These theorems are analogous to those 
given in Chapter 3. Let us first consider the following lemmas (the proofs are 
left to the reader): 

Lemma 7.3.1. Every bounded sequence in has a convergent subse- 
quence. 

This lemma is analogous to Theorem 5.1.4. 

Lemma 7.3.2. Suppose that /, g: D ^ R are real-valued continuous func- 
tions, where D cZ^". Then we hâve the following: 

!• f8 ^re continuous in D. 

2. l/l is continuous in D. 

3. 1// is continuous in D provided that /(x) ^ 0 for ail x in D. 

This lemma is analogous to Theorem 3.4.1. 

Lemma 7.3.3. Suppose that f: D -^R^ is continuous, where D cZ^”, and 
that g: G ^Z^'" is also continuous, where G <^R^ is the image of D under f. 
Then the composite fonction g°f: D^R"', defined as g°f(x) = g[f(x)], is 
continuous in D. 

This lemma is analogous to Theorem 3.4.2. 

Theorem 7.3.1. Let /: D ^ R be a real-valued continuous fonction 
defined on a closed and bounded set D cZ^”. Then there exist points p and q 
in D for which 


/(p) = sup/(x), (7.1) 

x^D 

/(q) = inf /(x) . (7.2) 

x^D 


Thus /(x) attains each of its infimum and supremum at least once in D. 

Proof. Let us first show that /(x) is bounded in D. We shall prove this by 
contradiction. Suppose that /(x) is not bounded in Z). Then we can find a 
sequence of points {p/}“=i in Z) such that l/(p/)l >i for i> 1 and hence 



266 


MULTIDIMENSIONAL CALCULUS 


l/(P;)l ^00 as / ^ 00 . Since the terms of this sequence are éléments in a 
bounded set, {p/}7=i must be a bounded sequence. By Lemma 7.3.1, this 
sequence has a convergent subsequence {pjt }“=i. Let Po be the limit of this 
subsequence, which is also a limit point of D\ hence, it belongs to Z>, since D 
is closed. Now, on one hand, |/(p^ )| ^ l/(Po)l i œ, by the continuity of 
/(x) and hence of |/(x)| [see Lemma 7.3.2(2)]. On the other hand, |/(p^ )| ^ 
This contradiction shows that /(x) must be bounded in Z>. Consequently, the 
infimum and supremum of /(x) in D are finite. 

Suppose now equality (7.1) does not hold for any p e Z). Then, M — /(x) > 0 
for ail xeZ), where M= supxe/)/(x). Consequently, [M —f(x)]~^ is positive 
and continuons in D by Lemma 13,2(3) and is therefore bounded by the first 
half of this proof. However, if 5 > 0 is any given positive number, then, by the 
définition of M, we can find a point Xg in D for which f(xg) > M — ô, or 

1 1 
M-/(xg) ^5- 


This implies that [M —f(x)] ^ is not bounded, a contradiction, which proves 
equality (7.1). The proof of equality (7.2) is similar. □ 

Theorem 7.3.2. Suppose that D is a closed and bounded set in R^. If 
f: D is continuons, then it is uniformly continuons in D. 

Proof. We shall prove this theorem by contradiction. Suppose that f is not 
uniformly continuons in D. Then there exists an e > 0 such that for every 
ô > 0 we can find a and b in Z) such that ||a — b|| < ô, but ||f(a) — f(b)|| > e. 
Let us choose ô=l/i, i>l. We can therefore find two sequences 
{a,}7=i,{bj“=i with a^b, eZ) such that Ha, — bj| < 1/i, and 




> € 



for i> 1. Now, the sequence {aj“=i is bounded. Hence, by Lemma 7.3.1, it 
has a convergent subsequence whose limit, denoted by ag, is in D, 

since D is closed. Also, since f is continuons at a g, we can find a A > 0 such 
that ||f(x) — f(ag)|| < e/2 if ||x — ag|| < A, where x eZ). By the convergence of 
{a^ to ag, we can choose large enough so that 


1 



(7.4) 


and 


^ki ^0 



(7.5) 



DERIVATIVES OF A MULTIVARIABLE FUNCTION 


267 


From (7.5) it follows that 




Furthermore, since 



< 1/^j, we can write 



< 


a 


k; 



+ 



A 



1 


< A. 


(7.6) 


Hence, by the continuity of f at Hq, 


f(\)-f(ao) 




From inequalities (7.6) and (7.7) we conclude that whenever satisfies 
inequalities (7.4) and (7.5), 


-f(bÆjll<llf(aÆ^) -f(ao)ll + llf(b^J -f(ao)l 


< € 


which contradicts inequality (7.3). This leads us to assert that f is uniformly 
continuons in D. □ 


7.4. DERIVATIVES OF A MULTIVARIABLE FUNCTION 

In this section we generalize the concept of différentiation given in Chapter 4 
to a multivariable function f: D where D cR”. 

Let a = (a^, be an interior point of D. Suppose that the limit 


lim 


f(^?l, ^2? * * * î ^ i ^ i ^ ^ ^w) ^ 1 î 


a 


2 ’ 




exists; then f is said to hâve a partial dérivative with respect to at a. This 
dérivative is denoted by é'f(a)/o'Xp or just f;^(a), / = 1, 2, . . . , n. Hence, partial 
différentiation with respect to is done in the usual fashion while treating 
ail the remaining variables as constants. For example, if /: ^ R is defined 



268 


MULTIDIMENSIONAL CALCULUS 


as f(xi, X 2 , X 3 ) =x^xl +X 2 X 3 , then at any point we hâve 


^/(x) 


âx- 


= r 2 

^ 2 , 


'?/(x) 


dX’ 


= 2X3X2 +X3, 


'?/(x) 


^X 


= 3X2x|. 


In general, if fj is the yth element of f (7 = 1, 2, . . . , m), then the terms 
âfj(x)/âXi, for / = 1, 2, . . . , 7 î; 7 = 1,2, constitute an mXn matrix 

called the Jacobian matrix (after Cari Gustav Jacobi, 1804-1851) of f at x and 
is denoted by Jf(x). If m=n, the déterminant of Jf(x) is called the Jacobian 
déterminant; it is sometimes represented as 


det[jf(x)] 



(7.8) 


For example, if f: -^R^ is such that 

f(Xi, X 2 , X 3 ) = (xi cos X 2 , x| +x| 


then 


Jf(Xi,X2,X3) = 


2 x 3 cos X 2 — X 3 sin X 2 


2 


xî e 


2X' 


0 

2 X 3 


Higher-order partial dérivatives of f are defined similarly. For example, the 
second-order partial dérivative of f with respect to x^ at a is defined as 


lim 


'x,( « 


1 ? ^2 ’ 


a 


J + /ïj, . . . , f^_(^Ï3, 


a 




and is denoted by â^î(sd/dxf, or Also, the second-order partial 

dérivative of f with respect to x^ and Xy, i # 7 , at a is given by 


lim 

hj^Q 


U 


^ 3 , Cl 2 ^ 


ttj hj, , , , , a 




a 




and is denoted by ^^f(a )/ âXj dx^, or f;»- ^ ^ 7 - 

J J I 



DERIVATIVES OF A MULTIVARIABLE FUNCTION 


269 


Under certain conditions, the order in which différentiation with respect 
to and Xj takes place is irrelevant, that is, identical to 

i ¥= j. This property is known as the commutative property of partial différen- 
tiation and is proved in the next theorem. 

Theorem 7.4.1. Let f: D where D ci^", and let a be an interior 

point of D. Suppose that in a neighborhood of a the following conditions are 
satisfied: 

1. df(x)/âXi and âf(x)/dXj exist and are finite (i,j = 1,2, . . . , n, j). 

2. Of the dérivatives â^f(x)/dx^ dXj, â^f(x)/âXj âx^ one exists and is 
continuons. 


Then 

âXi âXj âXj âXi * 

Proof. Let us suppose that d^ï{iO/dXj dXi exists and is continuons in a 
neighborhood of a. Without loss of generality we assume that i < j. If 
^^f(a)/^x, âXj exists, then it must be equal to the limit 

«2, . . . , + hi, . . . , «2, . . . , . . . , 

lim ^ , 

h: 


that is. 


1 

lim — 
h; 


1 


lim — ff(ai , « 2 î • • • î • • • î • • • î ^«) 

/î.-^O h:^ JJ 


/Zj,..., Cl j , , , , , ^ ^ J 


1 


— lim — (f (a, , « 2 , . . . , zï;, . . . , zï, + /z,, . . . , 

/^y^o hj JJ 




(7.9) 


Let us dénoté fCx^, %2, . . . , Xj + /zy, . . . , x„) — f(x^, %2, . . . , Xy, . . . , x„) by 
%2, . . . , x„). Then the double limit in (7.9) can be written as 


1 


lim lim ^ , a- h:, , , , , a., , , . , a ) 

.a;. .aZiZiL' j ' 


/z^/Zy 


î)/ (z?!, Gr '2^ ^ . . . , Cl ^ ^ . . . , Zïy ,...,ZZ^)| 


1 

= lim lim — 
/l.-^O Zi..^o h: 


« 2 ? • • • î • • • î • • • î ^n) 


dX; 


, (7.10) 



270 


MULTIDIMENSIONAL CALCULUS 


where 0 < < 1. In formula (7.10) we hâve applied the mean value theorem 

(Theorem 4.2.2) to as if it were a function of the single variable (since 

âf/âXi, and hence exists in a neighborhood of a). The right-hand 

side of (7.10) can then be written as 


1 

lim lim — 
h; 


é'f («1, «2? • • • ? + ^ihi, . . . , + /Z;, . . . , 


dX: 


Ü2, ‘ ‘ ‘ , ü- Oih^, . . . , üj, . . . , 


âX: 


â 

= lim lim 

hj^O âXj 


(« 1 , ^ 2 , + Oihi,. Ojhj ,. . . , 


âX: 


(7.11) 


where 0 < (^j < 1- In formula (7.11) we hâve again made use of the mean 
value theorem, since i^^f(x)/o'Xy dx^ exists in the given neighborhood around 
a. Furthermore, since o'^f(x)/^Xy âx^ is continuons in this neighborhood, the 
double limit in (7.11) is equal to â^f(si)/âXj âx^. This establishes the asser- 
tion that the two second-order partial dérivatives of f are equal. □ 

Example 7.4.1. Consider the function /: ~^R, where fix-^, X2, x^) = 

x^e^^ +X 2 cos x^. Then 


<?/(Xi,X2,X3) 

= sin Xi , 

âx^ ^ ^ 

df{x^,X2,x^) 

+ QOS Xi , 

^X2 


^Y(Xi,X2,X3) 

dX2 àXi 


= e^^ — smxu 


d^f{x^,X2,X2,) 

= c 2 — sm Xi . 

dX^ 8X2 


7.4.1. The Total Dérivative 

Let /(x) be a real-valued function defined on a set D<zR^, where x = 
(x^, X 2 , . . . , x„)'. Suppose that x^ X 2 , . . . , x„ are functions of a single variable 
L Then / is a function of t. The ordinary dérivative of / with respect to t, 
namely df/dt, is called the total dérivative of /. 



DERIVATIVES OF A MULTIVARIABLE FUNCTION 


271 


Let us now assume that for the values of t under considération dx^/dt 
exists for /=1,2, and that âf(x)/âXi exists and is continuons in the 
interior of D for / = 1 , 2 , . . . , n. Under these considérations, the total dériva- 
tive of / is given by 


dt 



^/(x) dXj 
âx^ dt 


( 7 . 12 ) 


To show this we proceed as follows: Let Ax^, A%2, . . . , Ax„ be incréments of 
X^; • • • î ^ n that correspond to an incrément At of t. In turn, / will hâve 
the incrément A/. We then hâve 


A/ = /(Xi + AXi,X 2 + AX2,...,X„ + Ax„) -/(Xi,X2,...,X„). 
This can be written as 

A/= [/(Xi + AXi,X 2 + AX2,...,X„ + Ax„) 

-/(Xi, X2 + AX2, . . . , x„ + AxJ] 


+ X2 + ^X2,...,X„ + ^X„) 

-f{Xi,X 2 ,X 2 + AX3,...,X„ + AX„)] 

+ [f{Xi, X2, X3 + AX3, . . . , x„ + Ax„) 

-/(Xi,X 2 ,X 3 ,X 4 + AX 4 ,...,X„ + AxJ] 
+ ••• + [f(Xi,X2,...,X„_i,X„ + Ax„) 



By applying the mean value theorem to the différence in each bracket we 
obtain 


A/=Axi 


âf{Xt + 01 AXi,X 2 + AX2,...,X„ + Ax„) 


âX- 


+ Ax- 


âf(x^,X2 + O2 AX2, X3 + AX3, . . . , X„ + Ax„) 


âX 


2 


+ Ax- 


^/(xi, X2, X3 + O2 AX3, X4 + AX4, . . . , x„ + Ax„) 


dX- 


+ ••• +Ax 


0/(Xi,X2,...,X„_i,X„+ 0 „ Ax„) 


n 


dx 


n 



272 


MULTIDIMENSIONAL CALCULUS 


where 0 < < 1 for / = 1 , 2 , . . . , n. Hence, 


A/ 


AXi df{Xy + 01 AXi,X 2 + Ax 2 ,...,x„ + Ax„) 




dX 


+ 


AX2 ^/(Xi,X2+ + + 


Ar 




+ 


A %3 % 2 , X 3 + ^3 AX 3 , X 4 + AX 4 , . . . , x„ + Ax„) 


Ar 




+ ••• + 


Ax„ 0/(xi,X2,...,x„_i,x„ + 0 „ Ax„) 


A? 




( 7 . 13 ) 


rt 


As A^ ^ 0, Axi/At ^ dXi/dt, and the partial dérivatives in (7.13), being 
continuons, tend to df{x)/dxi for / = 1,2, . . . , n. Thus àf/At tends to the 
right-hand side of formula (7.12). 

For example, consider the function f(x^, X 2 ) =xf — x^, where x^ = 
cos t, X 2 = cos t + sin t. Then, 

df 

— = 2x^(c^cos t — sin t) — 3x1 ( — sin t + cos t) 
dt 

2 

= 2c^ cos t(c^cos t — sin t) — 3(cos t + sin t) ( — sin t + cos t) 

= (cos t — sin t)(2c^^cos t — 6sin t cos t — 3) . 

Of course, the same resuit could hâve been obtained by expressing / directly 
as a function of t via x^ and X 2 and then differentiating it with respect to t. 

We can generalize formula (7.12) by assuming that each of x^, X 2 , . . . , x„ is 
a function of several variables including the variable t. In this case, we need 
to consider the partial dérivative df/ dt, which can be similarly shown to hâve 
the value 


dt 



^/(x) dXj 
dXi dt 


In general, the expression 


A df(x) 
df=Yh dxt 


i = l 


dX: 


( 7 . 14 ) 


( 7 . 15 ) 


is called the total differential of / at x. 



DERIVATIVES OF A MULTIVARIABLE FUNCTION 


273 


Example 7.4.2. Consider the équation /(x^, X 2 ) = 0, which in general 
represents a relation between x^ and X 2 . It may or may not define X 2 as a 
function of In this case, X 2 is said to be an implicit function of x^. If X 2 
can be obtained as a function of then we write X 2 =g(xi). Consequently, 
f[x^,g(x^)] will be identically equal to zéro. Hence, f[x^, g(x^)], being a 
function of one variable x^, will hâve a total dérivative identically equal to 
zéro. By applying formula (7.12) with t =x^ we obtain 

df âf df dx2_ 

— — = ^ — = 0 . 

dx^ âXi dX2 dx^ 


If df/ 8 X 2 ^ 0, then the dérivative of X 2 is given by 

dx2 —df/dXi 
df/dX2 


(7.16) 


In particular, if /(x^X 2 ) = 0 is of the form x^—h(x 2 ) = 0, and if this 
équation can be solved uniquely for X 2 in terms of x^, then X 2 represents the 
inverse function of h, that is, X 2 = h~^{xf). Thus according to formula (7.16), 

dh~^ 1 
dx^ dh/dx2 


This agréés with the formula for the dérivative of the inverse function given 
in Theorem 4.2.4. 


7.4.2. Directional Dérivatives 

Let f: D where D and let v be a unit vector in (that is, a 

vector whose length is equal to one), which represents a certain direction in 
the 7î-dimensional Euclidean space. By définition, the directional dérivative 
of f at a point x is the interior of D in the direction of v is given by the limit 

f(x + h\) — f(x 


if it exists. In particular, if v = e,, the unit vector in the direction of the ith 
coordinate axis, then the directional dérivative of f in the direction of v is just 
the partial dérivative of f with respect to x^ (/ = 1, 2 , . . . , n). 

Lemma 7.4.1. Let f: D^R"^, where D<zR^. If the partial dérivatives 
âfj/âx^ exist at a point x = (x^, X 2 , . . . , x„)' in the interior of D for i = 
1, 2 , . . . , 7 î; 7 = 1, 2 , . . . , m, where fj is the jth element of f, then the direc- 
tional dérivative of f at x in the direction of a unit vector v exists and is equal 
to Jf(x)v, where Jf(x) is the mXn Jacobian of f at x. 



274 


MULTIDIMENSIONAL CALCULUS 


Proof Let us first consider the directional dérivative of /y in the direction 
of V. To do so, we rotate the coordinate axes so that v coincides with the 
direction of the ^^-axis, where ^re the resulting new coordi- 

nates. By the well-known relations for rotation of axes in analytic geometry of 
n dimensions we hâve 

n 

Xi = H i=l,2,...,n, (7.17) 

1 = 2 

where v- is the /th element of v (/ = 1, 2 , . . . , n) and is the ith element of 
kl, the unit vector in the direction of the ^^-axis {1 = 2,3,,,,, n), 

Now, the directional dérivative of /y in the direction of v can be obtained 
by first expressing fj as a function of tising the relations 

(7.17) and then differentiating it with respect to By formula (7.14), this is 
equal to 

« âf: 

= i = l,2,...,m. (7.18) 

i=i 

From formula (7.18) we conclude that the directional dérivative of f = 
(/i, / 2 , • • • , fm)' lhe direction of v is equal to Jf(x)v. □ 

Example 7.4.3. Let f: be defined as 


. x^+xi+xi 

f(Xi,X2,X3) = 

—X^X2 +x| 

The directional dérivative of f at x = (1, 2, 1)' in the direction of v = (1/ V^, 
-l/^/2,0)' is 




DERIVATIVES OF A MULTIVARIABLE FUNCTION 


275 


Définition 7.4.1, Let /: D^R, where D<zR^. If the partial dérivatives 
âf/ dXi (/ = 1, 2, . . . , n) exist at a point x = (x^, X 2 , . . . , x„)' in the interior of 
Z), then the vector (o'// âf/ âX2 , . . . , âf/ âx/)' is called the gradient of / 
at X and is denoted by V/(x). □ 

Using Définition 7.4.1, the directional dérivative of / at x in the direction 
of a unit vector v can be expressed as V/(x)'v, where V/(x)' dénotés the 
transpose of V/(x). 

The Géométrie Meaning of the Gradient 

Let /: D ^R, where D Suppose that the partial dérivatives of / exist 
at a point x = (x^ X 2 , . . . , x„)' in the interior of D. Let C dénoté a smooth 
curve that lies on the surface of /(x) = Cq, where Cq is a constant, and passes 
through the point x. This curve can be represented by the équations x^ = 
g/t),X2=g2iO,>>>,x„=gJ^t), where a<t<b. By formula (7.12), the total 
dérivative of / with respect to ^ at x is 


df_ 

dt 



^/(x) dgj 

âx^ dt 


(7.19) 


The vector X = (dg^/dt, dg 2 /dt , . . . , dg^/dtf is tangent to C at x. Thus from 
(7.19) we obtain 


df 

- = V/(x)'X. (7.20) 

Now, since /[gft), g2(0, • • • , ^„(0] = Cq along C, then df/dt = 0 and hence 
V/(x)'X = 0. This indicates that the gradient vector is orthogonal to X, and 
hence to C, at x e Z>. Since this resuit is true for any smooth curve through x, 
we conclude that the gradient vector V/(x) is orthogonal to the surface of 
/(x) = Cq at X. 

Définition 7.4.2. Let /: D ^ R, where D<zR^. Then V/: D^R^. The 
Jacobian matrix of V/(x) is called the Hessian matrix of / and is denoted by 
Hy^(x). Thus Hy(x) = Jv/(x), that is. 


'?V(x) 

^V(x) 

^V(x) 

âxf 

» 

^X2 âXi 

» 

dx„ dx^ 

» 

» 

» 

<?V(x) 

» 

» 

^V(x) 

» 

» 

^V(x) 

âXi âX„ 

dX2 dX„ 



H/(x) = 


(7.21) 



276 


MULTIDIMENSIONAL CALCULUS 


The déterminant of Hy^(x) is called the Hessian déterminant. If the conditions 
of Theorem 7.4.1 regarding the commutative property of partial différentia- 
tion are valid, then Hy^(x) is a symmetric matrix. As we shall see in Section 
7.7, the Hessian matrix plays an important rôle in the identification of 
maxima and minima of a multivariable function. □ 


7.4.3. Différentiation of Composite Functions 

Let f: where and let g: Z >2 where D 2 Let Xq 

be an interior point of and fCxg) be an interior point of Z>2. If the mXn 
Jacobian matrix Jf(xn) and the pXm Jacobian matrix J [f(xn)] both exist, 
then the pXn Jacobian matrix J^Cxq) for the composite function h = g°f 
exists and is given by 


Jh(xo) = Jg[f(xo)]Jf(xo). (7.22) 


To prove formula (7.22), let us consider the (k, Oth element of Jh(xoX namely 
âhi^{xQ)/dx^, where /ï/^(xq) =g/.[f(xQ)] is the Æth element of hCxg) = g[f(xg)], 
i= 1, 2, . . . , 7 î; k= 1, 2, . . . , By applying formula (7.14) we obtain 


âhi^ixo) ^ ^ ^gjf(xo)] dfj{xo) 


âX: 


7 = 1 




âX: 


i — 1,2,. ..,/r, k — 1,2,..., p ^ 


(7.23) 


where //xg) is the yth element of f(xg), 7 = 1, 2, . . . , m. But o'g^[f(xo)]/ df^ is 
the (k,j)th element of Jg[f(xg)], and âfj(xo)/âx^ is the (7, Oth element of 
Jf(xg), / = 1,2, . . . , 7 î; 7 = 1, 2,..., m; Æ = 1, 2, . . . , /?. Hence, formula (7.22) 
follows from formula (7.23) and the rule of matrix multiplication. 

In particular, if m=n =p, then from formula (7.22), the déterminant of 
Jh(xg) is given by 


det[Jh(xo)] =det[jg(f(xo))]det[jf(xo)]. (7.24) 

Using the notation in formula (7.8), formula (7.24) can be expressed as 


â{hi,h 

d(Xi,X 


2’ • 


2’ • 



*^(^1? ^2 ? * * * ? *^( /l ? /2 ? * * * ? /n) 


*^( /l ? /2 ? * * * ? /n) *^(-^1? ^2? * * * ? -^n) 


(7.25) 


Example 7.4.4. Let f: R^ R^ be given by 


xf —X 2 cos Xi 

X1X2 

xl+xl 


f(Xi,X2) = 



TAYLOR’S THEOREM FOR A MULTIVARIABLE FUNCTION 


277 


Let g: ^ R be defined as 


where 


In this case, 


^2-> ès) ~ èl + 


.2 


=Xl~X2 COS Xi, 

^2 
^3 


X1X2, 

Xi +X2. 


Jf(x) = 


2x^ +X 2 sin x^ 
^2 

3x1 


— cos X 


X 


1 


3x| 


Jjf(x)] =(1, -2^2,1). 


Hence, by formula (7.22), 


Jh(x) = (1,- 2^2,1) 


2x^ + X 2 sin x^ — cos x 


X 


X 


1 


3x^ 


3xf 


= (2xi +X 2 sin — 2x^xj + 3x^ cos x^ — 2xfx2 + 3x|) 


7.5. TAYLOR’S THEOREM FOR A MULTIVARIABLE FUNCTION 


We shall now consider a multidimensional analogue of Taylor’s theorem, 
which was discussed in Section 4.3 for a single-variable function. 

Let us first introduce the following notation: Let x = (x^, X 2 , . . . , x„)'. 
Then x'V dénotés a first-order differential operator of the form 


n 


X 




d 


i=l 


dX: 


The Symbol V, called the del operator, was used earlier to define the 
gradient vector. If m is a positive integer, then (x'V)'" dénotés an mth-order 
differential operator. For example, for m = n = 2, 


(x'V)^ = 


â 


d \ 


X 


dX 


+ X- 


dX 


2 ] 


_^2 


d 


= X 


^ dx\ 


+ 2x^X2 


d 


d 


dx^ dX2 


+ x 





278 


MULTIDIMENSIONAL CALCULUS 


Thus (x'V)^ is obtained by squaring Xid/dXi+X2à/dX2 in the usual 
fashion, except that the squares of d/dx^ and d/ 8X2 are replaced by / dx\ 
and â^/dxl, respectively, and the product of d/dx^ and 8/8X2 i^ replaced 
by 8^ / 8x^ 8X2 (here we are assuming that the commutative property of 
partial différentiation holds once these differential operators are applied to a 
real-valued function). In general, (x'V)'” is obtained by a multinomial expan- 
sion of degree m of the form 


(x'v)'"= E 


m 


kl, k 2 , ■ ■ ■ , k^ 


If If 


^2 




8 


m 








where the sum is taken over ail n-tuples (k^, Æ2, . . . , k^) for which 'L’l=ik- = m, 
and 


m 

Æl, ^2 ? • • • ? 


m ! 


k/k2 \ •••/:„! 


If a real-valued function /(x) has partial dérivatives through order m, then 
an application of the differential operator (x'V)"^ to /(x) results in 


(x'v)'"/(x) = i: 


m 


ki,k 2 ,...,k„ 


k\, ^2 , . . . , 


yki yk 2 
^2 


xx!" 




” ■■■ âx^" ■ 


(7.26) 


The notation (x'V)'”/(xq) indicates that (x'V)'"/(x) is evaluated at Xq. 

Theorem 7.5.1. Let /: D^R, where DaR", and let A^^Cxq) be a 
neighborhood of XgeZ) such that A^g(xg)cZ). If / and ail its partial 
dérivatives of order < r exist and are continuons in A^g(xg), then for any 
xe A/g(xoX 


r — 


/(») -/(Xü) + L + ; . (7.27) 


/ = 1 


il 


where Zq is a point on the line segment from Xq to x. 

Proof. Let h = x — Xq. Let the function (jy{t) be defined as c/>(t) =/(xq + 
th), where 0 < t < L If ^ = 0, then (/>(0) =/(xq) and </>(!) =/(xq + h) =/(x), if 



TAYLOR’S THEOREM FOR A MULTIVARIABLE FUNCTION 


279 


t=l. Now, by formula (7.12), 


dt 



x = Xo + fh 


= (h'V)/(xo + ?h), 


where is the ith élément of h (/ = 1, 2, . . . , n). Furthermore, the dérivative 
of order m of (/>(0 is 


d^(l){t) 

dr 


(h'V)"^/(xo + ^h), 


1 < m < r. 


Since the partial dérivatives of / through order r are continuons, then the 
same order dérivatives of (/>(0 are also continuons on [0, 1] and 

= (h'V)'"/(xo), l<m<r. 

t = 0 

If we now apply Taylor’s theorem in Section 4.3 to the single-variable 
function we obtain 


d^<P(t) 

dr 


d d^(t) 

HO = H^) + E 77 


i = l 


il dV 


d æ4>{t) 

t={) r\ dd 


t=î 


(7.28) 


where 0 <è<t. By setting ^ = 1 in formula (7.28), we obtain 


r — 


^ ^ ^ [(x-Xo)'V]7(xo) [(x-Xo)'V] /(Zo) 

/(x) =/(Xo) + L + 


i=\ 


il 


where Zq = Xq + ^h. Since 0<^<l, the point Zq lies on the line segment 
between Xq and x. □ 

In particular, if /(x) has partial dérivatives of ail orders in A^g(xo), then we 
hâve the sériés expansion 


00 


/(x) =/(Xo) + E 


[(x-Xo)'V]7(xo) 


i = l 




(7.29) 


In this case, the last term in formula (7.27) serves as a remainder of Taylor’s 
sériés. 



280 


MULTIDIMENSIONAL CALCULUS 


Example 7.5.1. Consider the function /: defined as /(x^, ^2) = 

X 1 X 2 +xl + cos X 2 . This function has partial dérivatives of ail orders. Thus 
in a neighborhood of Xq = (0,0)' we can write 

/(xi,X 2 ) = 1 + (x'V)/(0,0) + ^(x'V)V(0,0) + ^X 2 ), 

0 <|< 1 . 


It can be verified that 

Xi, 

3xi + 2 x^X 2 -x|, 

xl cos( 1 X 2 ) — 3x^X2 sin( ^X 2 ) 

— 3x^x1 cos( ^X 2 ) +X 2 sin( ^X 2 ). 

Hence, 


(x'V)/(0,0) = 
(x'V)V(0,0) = 

(x'V)V( ^^1,^X2) = 


f{Xl,X2) 


1 


= 1 +Xi + + 2 x 1 X 2 -x|) 


1 


+ ^{[xi -3xix|] e^^‘cos( ^X 2 ) + [x^ -3xi%] sin( ^X 2 )} 


The first three terms serve as a second-order approximation of /(x^,X 2 ), 
while the last term serves as a remainder. 


7.6. INVERSE AND IMPLICIT FUNCTION THEOREMS 

Consider the function f:D^ R", where D cR”. Let y = f(x). The purpose of 
this section is to présent conditions for the existence of an inverse function 
f“^ which expresses x as a function of y. These conditions are given in the 
next theorem, whose proof can be found in Sagan (1974, page 371). See also 
Fulks (1978, page 346). 

Theorem 7.6.1 (Inverse Function Theorem). Let f:D^ R”, where D is 
an open subset of R" and f has continuons first-order partial dérivatives 
in D. If for some Xq the nXn Jacobian matrix Jf(xo) is nonsingular. 



INVERSE AND IMPLICIT FUNCTION THEOREMS 


281 


that is, 


det[jf(xg)] 



where fi is the /th élément of f (/ = 1 , 2, . . . , n), then there exist an 6 > 0 and 
a ô > 0 such that an inverse function exists in the neighborhood A^g[f(xo)] 
and takes values in the neighborhood A^^(xq). Moreover, has continuons 
first-order partial dérivatives in A^g[f(xQ)], and its Jacobian matrix at ICxq) is 
the inverse of JfCxg); hence, 


det{jf-i[f(xo)]} 


1 

det[jf(xo); ■ 


(7.30) 


Example 7.6.1. Let f: be given by 


Here, 


f( 


Xi,X2, 




2X1X2 

-^2 

X3) = 

Xi +X2 

+ 2 x 


X1X2 

+ X2 

2X2 

2 xi — 1 

0 

2 xi 

1 

4 x 

^2 

Xi + 1 

0 


? 


and det[Jf(x)] = — 12 X 2 X 3 . Hence, ail x^R^ at which X 2 X 3 # 0 , f has an 
inverse function For example, if Z> = {(x^ X2, ^3)1X2 > 0, X 3 > 0}, then f is 
invertible in D. From the équations 


yi = 2x1X2 -X2, 

y2=xf+X2 + 2xj, 
y2=X^X2+X2, 


we obtain the inverse function x = f Ky), where 


yi +^3 
2^3 -yi ’ 

-yi + 2^3 



(yi+ysf 

(2y3~yif 


2y,-y 


1/2 


yi- 


3 



282 


MULTIDIMENSIONAL CALCULUS 


If, for example, we consider Xq = (1,1,1)', then yg = fCxg) = (1, 4, 2)', and 
det[Jf(xo)] = —12. The Jacobian matrix of at yg is 


Jf-i(yo) = 


2 

3 

1 

3 

1 

4 


0 

0 

1 

4 


1 

3 

2 

3 

0 


Its déterminant is equal to 


det[jf-i(yo)] = 


We note that this is the reciprocal of det[Jf(xg)], as it should be according to 
formula (7.30). 

The inverse function theorem can be viewed as providing a unique 
solution to a System of n équations given by y = f(x). There are, however, 
situations in which y is not explicitly expressed as a function of x. In general, 
we may hâve two vectors, x and y, of orders n X 1 and m X 1, respectively, 
that satisfy the relation 


g(x,y) =0, 


(7.31) 


where g: In this more general case, we hâve n équations 

involving m n variables, namely, the éléments of x and those of y. The 
question now is what conditions will allow us to solve équations (7.31) 
uniquely for x in terms of y. The answer to this question is given in the next 
theorem, whose proof can be found in Fulks (1978, page 352). 

Theorem 7.6.2 (Implicit Function Theorem). Let g: D ^ where D is 
an open subset of and g has continuons first-order partial dérivatives 

in D. If there is a point Zg^Z), where Zg = (xo,yg)' with Xg^i^”, yg^i^"* 
such that g(zg) = 0, and if at Zg, 


^ 2 ’ * * * ’ §n^ 




, X2 5 




where g, is the ith element of g (/ = 1, 2, . . . , n), then there is a neighborhood 
A^g(yo) of yg in which the équation g(x,y) = 0 can be solved uniquely for x as a 
continuously différentiable function of y. 



OPTIMA OF A MULTIVARIABLE FUNCTION 


283 


Example 7.6.2. Let g: be given by 


We hâve 


g(xi,X2,y) 


+X 2 +y^ — 18 

Xi —X 1 X 2 +y — 4 


^(81,82) /[ 1 

d{x^,X2) 


1 


1 \ 





Let Z = (x^, X 2 , y)'. At the point Zq = (1,1,4)', for example, g(zQ) = 0 and 
â(g^,g 2 )/à(xi,X 2 )= — 1#0. Hence, by Theorem 7.6.2, we can solve the 
équations 

Xi +X 2 +y^ — 18 = 0, (7.32) 

Xi — X 1 X 2 +y — 4 = 0 (7.33) 


uniquely in terms of y in some neighborhood of yo ^ 4. For example, if D in 
Theorem 7.6.2 is of the form 


Z) = ((xi, X 2 , y) |xi > 0, X 2 > 0, y < 4.06}, 


then from équations (7.32) and (7.33) we obtain the solution 


^i = è -(/-n)+ (y^-17) -4y + 16 


1 / 2 ' 


^2 = è{i9-y"-[(/-ivf-43^ + 


16 


1/2 


We note that the sign preceding the square root in the formula for x^ was 
chosen as +, so that Xi=X 2 = l when y = 4. It can be verified that 
(y^ — 17)^ — 4y + 16 is positive for y < 4.06. 


7.7. OPTIMA OF A MULTIVARIABLE FUNCTION 

Let /(x) be a real-valued function defined on a set D cR”. A point Xq eZ) is 
said to be a point of local maximum of / if there exists a neighborhood 
Ng(xQ)<zD such that /(x)</(xq) for ail xeA^g(xg). If /(x)>/(xq) for ail 
xe Ag(xo), then Xq is a point of local minimum. If one of these inequalities 
holds for ail x in Z), then Xq is called a point of absolute maximum, or a point 
of absolute minimum, respectively, of / in D. In either case, Xq is referred to 
as a point of optimum (or extremum), and the value of /(x) at x = Xq is called 
an optimum value of /(x). 



284 


MULTIDIMENSIONAL CALCULUS 


In this section we shall discuss conditions under which /(x) attains local 
optima in D. Then, we shall investigate the détermination of the optima of 
/(x) over a constrained région of Z). 

As in the case of a single-variable function, if /(x) has first-order partial 
dérivatives at a point Xq in the interior of Z), and if Xq is a point of local 
optimum, then df/ dx^ = 0 for / = 1, 2, . . . , n at Xq. The proof of this fact is 
similar to that of Theorem 4.4.1. Thus the vanishing of the first-order partial 
dérivatives of /(x) at Xq is a necessary condition for a local optimum at Xg, 
but is obviously not sufficient. The first-order partial dérivatives can be zéro 
without necessarily having a local optimum at Xq. 

In general, any point at which df/ dx^ = 0 for / = 1, 2, . . . , n is called a 
stationary point. It follows that any point of local optimum at which / has 
first-order partial dérivatives is a stationary point, but not every stationary 
point is a point of local optimum. If no local optimum is attained at a 
stationary point Xg, then Xg is called a saddle point. The following theorem 
gives the conditions needed to hâve a local optimum at a stationary point. 

Theorem 7.7.1. Let /: D where D cZ^”. Suppose that / has contin- 
uous second-order partial dérivatives in D. If Xg is a stationary point of /, 
then at Xg / has the following: 

i. A local minimum if (h'Vffix o)>0 for ail h = in a 

neighborhood of 0, where the éléments of h are not ail equal to zéro. 

ii. A local maximum if (h'V)^/(x o)<0, where h is the same as in (i). 

iii. A saddle point if (h'V)Y(xg) changes sign for values of h in a 
neighborhood of 0. 

Proof. By applying Taylor’s theorem to /(x) in a neighborhood of Xg we 
obtain 


/(xq + h) =/(xq) + (h'V)/(xo) + — (h'V)V(zo)> 

where h is a nonzero vector in a neighborhood of 0 and Zg is a point on the 
line segment from Xg to Xg + h. Since Xg is a stationary point, then 
(h' V)/(xg) = 0. Hence, 


/(xo + h) -/(xo) = 



(h'V)V(zo). 


Also, since the second-order partial dérivatives of / are continuons at Xg, 
then we can write 



OPTIMA OF A MULTIVARIABLE FUNCTION 


285 


where ||h|| = (h'h)!/^ and o(||h||)^0 as h ^ 0. We note that for small values 
of ||h||, the sign of /(xg + h) — /(xq) dépends on the value of (h' V)^/(xq). It 
follows that if 

i. (h'V)2/(xo)>0, then /(xq + h) >/(xq) for ail nonzero values of h in 
some neighborhood of 0. Thus Xq is a point of local minimum of /. 

ii. (h'V)2/(xo)<0, then /(xq + h) </(xq) for ail nonzero values of h in 
some neighborhood of 0. In this case, Xg is a point of local maximum 
of /. 

iii. (h' V)^/(xg) changes sign inside a neighborhood of 0, then Xg is neither 

a point of local maximum nor a point of local minimum. Therefore, Xg 
must be a saddle point. □ 

We note that (h'V)^/(xg) can be written as a quadratic form of the form 
h'Ah, where A = H^(xq) is the nXn Hessian matrix of / evaluated at Xg, that 
is, 



(7.34) 


where for simplicity we hâve denoted â^f(xo)/âXi âXj by /, 7 = 1 , 2 , . . . , n 
[see formula (7.21)]. 

Corollary 7.7.1. Let / be the same function as in Theorem 7.7.1, and let 
A be the matrix given by formula (7.34). If Xg is a stationary point of /, then 
at Xg / has the following: 

i. A local minimum if A is positive definite, that is, the leading principal 
minors of A (see Définition 2.3.6) are ail positive. 


fn > 0 , 




det(A)>0. (7.35) 


ii. A local maximum if A is négative definite, that is, the leading principal 
minors of A hâve alternating signs as follows: 


/il < 0, 




>0,...,(-l)”det(A) >0. 


(7.36) 


iii. A saddle point if A is neither positive definite nor négative definite. 



286 


MULTIDIMENSIONAL CALCULUS 


Proof 

i. By Theorem 7.7.1, / has a local minimum at Xq if (h' V)Y( xq) = h'Ah 
is positive for ail h 0, that is, if A is positive definite. By Theorem 
2.3.12(2), A is positive definite if and only if its leading principal 
minors are ail positive. The conditions stated in (7.35) are therefore 
sufficient for a local minimum at Xq. 

ii. (h'V)^/(xQ) < 0 if and only if A is négative definite, or — A is positive 
definite. Now, a leading principal minor of order m ( = 1,2, . . . , n) of 
— A is equal to (—1)"^ multiplied by the corresponding leading princi- 
pal minor of A. This leads to conditions (7.36). 

iii. If A is neither positive definite nor négative definite, then (h' V)^/(xg) 

must change sign inside a neighborhood of Xq. This makes Xq a saddle 
point. □ 

A Spécial Case 

If / is a function of only n = 2 variables, and % 2 , then conditions (7.35) 
and (7.36) can be written as: 


i. > 0, f 11/22 ~fi 2 ^ 0 for a local minimum at Xq. 
ii- fil < 0, f 11/22 ~/i 2 ^ 0 for a local maximum at Xq. 

If / 11/22 ~/h ^ 0? th&n Xq is a saddle point, since in this case 


h'Ah = h 


^¥(Xo) 


âX‘ 


+ 2h^ h 


l"-2 


^¥(Xq) 

dX-^ dX2 


+ /z 


2^V(Xo) 


2 


dX 


2 


^V(Xq) 


dx 


{h^-ah2){h^-bh2), 


where ah 2 and bh 2 are the real roots of the équation h 'Ah = 0 with respect 
to hi. Hence, h'Ah changes sign in a neighborhood of 0. 

If fil f 22 -fi 2 = 0> then h'Ah can be written as 


^V(xo) 


h'Ah = 


âx 


hi h 


'^V(Xo)/^Xi dX2 


i2 


2 


d'^f{^o)/dxi 


provided that d^/ix^)/ dx\=^^. Thus h'Ah has the same sign as that of 
d^/ix^)/ dx\ except for those values of h = (/z^, h 2 )' for which 


+ /ï2 


d^f{Xo)/dX-^ d%2 



â^f(Xo)/dxl 



OPTIMA OF A MULTIVARIABLE FUNCTION 


287 


in which case it is zéro. In the event â^f(xQ)/âxl = 0, then â^f(xo)/âXi 8 X 2 
= 0, and \v'Ah = h\d^f{xQ)/dx\, which has the same sign as that of 
â^f(xQ)/âxl, if it is different from zéro, except for those values of h = 
(hi, /ï 2 )' for which h 2 = 0, where it is zéro. It follows that when f 11/22 ~fn ^ 
0, h'Ah has a constant sign for ail h inside a neighborhood of 0. However, it 
can vanish for some nonzero values of h. For such values of h, the sign of 
/(xq + h) — /(xq) dépends on the signs of higher-order partial dérivatives 
(higher than second order) of / at Xq. These can be obtained from Taylor’s 
expansion. In this case, no decision can be made regarding the nature of the 
stationary point until these higher-order partial dérivatives (if they exist) hâve 
been investigated. 


Example 7.7.1. Let /: be the function /(x^, X 2 ) =x\ + 2x\ — x^. 

Consider the équations 

=2xi-l = 0, 

dx^ ^ 

àf 

=4x2 = 0. 

^X2 

The only solution is Xq = (0.5,0)'. The Hessian matrix is 

. [2 Ol 


which is positive definite, since 2 > 0 and det(A) = 8 > 0. The point Xq is 
therefore a local minimum. Since it is the only one in R^, it must also be the 
absolute minimum. 


Example 1H2. Consider the function /: R^ R, where 

/(x^, X2, X3) = ^xl + 2x| +x| — 2X3X2 + 3X3X3 +X2X3 

— 10X3 + 4X2 “ 6^3 + 1- 

A stationary point must satisfy the équations 

=x? — 2x2 + 3X3 — 10 = 0, 

^X3 ^ ^ ^ 

= — 2Xi + 4xo +Xo + 4 = 0, 

df 

= 3 xi +X2 + 2x3 — 6 = 0. 

dx. 


(7.37) 

(7.38) 


(7.39) 



288 


MULTIDIMENSIONAL CALCULUS 


From (7.38) and (7.39) we get 


Xi =Xi — 2 , 

X 3 = 4 — 2xi. 


By substituting these expressions in équation (7.37) we obtain 


xl — 8X1 + 6 = 0. 

This équation bas two solutions, namely, 4 — and 4 + /ÏÔ^ . We therefore 
hâve two stationary points, 


= (4 + /ïô,2 + /ÏÔ,- 4 - 2 /ÏÔ)', 

x<q2) = (4 - /ïô ,2 - /ÏÔ,- 4 + 2 /ÏÔ)'. 

Now, the Hessian matrix is 



-2 3 

4 1 • 

1 2 


Its leading principal minors are 2xi,Sxi — 4, and 14xi — 56. The last one is 
the déterminant of A. At x[,b ail three are positive. Therefore, x^^^ is a point 
of local minimum. At x^^^ the values of the leading principal minors are 1.675, 
2.7018, and —44.272. In this case, A is neither positive definite over négative 
definite. Thus x^q^^ is a saddle point. 


7.8. THE METHOD OF LAGRANGE MULTIPLIERS 


This method, which is due to Joseph Louis de Lagrange (1736-1813), is used 
to optimize a real-valued function /(x^, X 2 , . . . , x„), where x^, X 2 , . . . , x„ are 
subject to m ( < 7î) equality constraints of the form 




* * ? ^«) 
* * ? ^n) 


(7.40) 


g^(Xi,X2,...,xJ =0, 

where ^ 2 ? • • • ? ^^e différentiable functions. 

The détermination of the stationary points in this constrained optimization 
problem is done by first considering the function 

m 

F(x) =/(x) + E A^gy(x), 

7 = 1 


(7.41) 



THE METHOD OF LAGRANGE MULTIPLIERS 


289 


where x = X 2 , . . . , and are scalars called Lagrange 

multipliers. By differentiating (7.41) with respect to X 2 , . . . , x„ and equat- 
ing the partial dérivatives to zéro we obtain 

âF âf ^ âg: 

= +VA,— = 0, i=l,2,...,n. 

dXi dXi ‘ dXi 

Equations (7.40) and (7.42) consist oi m -\-n équations in m-\-n unknowns, 
namely, x^, X 2 , . . . , x„; A^, A 2 , . . . , A^. The solutions for x^, X 2 , . . . , x„ déter- 
mine the locations of the stationary points. The following argument explains 
why this is the case: 

Suppose that in équation (7.40) we can solve for m x/s, for example, 
x^, X 2 , . . . , x^, in terms of the remaining n—m variables. By Theorem 7.6.2, 
this is possible whenever 


(7.42) 


Sm ) 


#0. 


In this case, we can write 


^1 =*l(^m + l> 


^m+2 > • • • ) ^n) ) 
^m + 2 > • • • ) ^n) ’ 





w + 1 ’ ^m + 


,2 , . . . , X„) . 


(7.43) 


(7.44) 


Thus /(x) is a function of only n—m variables, namely, ^m+ 2 ? • • • ? 

If the partial dérivatives of / with respect to these variables exist and if / has 
a local optimum, then these partial dérivatives must necessarily vanish, 
that is. 




m 


+ E 


df dh: 


;=1 dhj dXi 



/=m + l,m + 2,...,n. 


(7.45) 


Now, if équations (7.44) are used to substitute /z^, /ï 2 , . • . , for x^, X 2 , . . . , x^, 
respectively, in équation (7.40), then we obtain the identities 


g2(hi,h2,...,h^, 


^m + 1 
^m + 1 


^m + 2 î * * * î ^n) 
^m+2 î • * * î ^n) ~ 


, X 


m + l ’ ^m -¥2 ’ 


gm(hl,h 2 , 


» » » 


» » » 


? ^n) 



290 


MULTIDIMENSIONAL CALCULUS 


By differentiating these identities with respect to + ^m+2? • • • ? we 
obtain 


— h y 

dx^ âhj dXi 


/ = m + 1, m + 2, . . . , 7 î; Æ = 1, 2, . . . , m. (7.46) 


Let us now define the vectors 


8.= 


^8k dgk 


9gk ^ 


dx 


5 5 5 

dx 


m + 1 '^'^m + 2 


n / 


^k = 


dgk ^gk 


dgk ^ 


(?/!l ’ ^/î2 ’ ” ’ ’ dh 


^i = 


dhj dh^ 


m / 


âh,\ 


dx 


5 5 • • • 5 

dx 


m + 1 '^'^m + 2 


« / 


i|,= 






^^m + 1 ^^m + 2 / 


T = 






dh^ ’ dh2 ^ ^ dh 


m / 


Æ = 1,2, . . . , m, 


Æ = 1,2, . . . , m. 


j 1 , 2 , . . . , 7?î , 


Equations (7.45) and (7.46) can then be written as 


[8i:Ô2:---:8m] + [tij: 112 : ••• : Ti„]r = 0, 


(7.47) 

(7.48) 


where F = [j{. y 2 '- •** : 7^], which is a nonsingular mXm matrix if condition 
(7.43) is valid. From équation (7.47) we hâve 


“ [Sj: 82: : 8„]r 1. 


By making the proper substitution in équation (7.48) we obtain 


v|j+ [8 i:82:---:8„]\ = 0, 


(7.49) 


where 


x= -r~^7. 


(7.50) 



THE METHOD OF LAGRANGE MULTIPLIERS 


291 


Equations (7.49) can then be expressed as 


âf ^ âgj 

h =0, / = m + 1, m + 2, . . . , 7î. 

^ dXi 


From équation (7.50) we also hâve 


df ^ dg: 

h ^ A, = 0, / = 1,2, 

dx, ’dx, 


(7.51) 


(7.52) 


Equations (7.51) and (7.52) can now be combined into a single vector 
équation of the form 


V/(x)+ EAyVg. = 0, 

; = i 

which is the same as équation (7.42). We conclude that at a stationary point 
of /, the values of % 2 , . . . , x„ and the corresponding values of A^, A 2 , . . . , A^ 
must satisfy équations (7.40) and (7.42). 


Sufficient Conditions for a Local Optimum 
in the Method of Lagrange Multipliers 

Equations (7.42) are only necessary for a stationary point Xq to be a point of 
local optimum of / subject to the constraints given by équations (7.40). 
Sufficient conditions for a local optimum are given in Gillespie (1954, pages 
97-98). The following is a reproduction of these conditions: 

Let Xq be a stationary point of / whose coordinates satisfy équations (7.40) 
and (7.42), and let A^, A 2 , . . . , A^ be the corresponding Lagrange multipliers. 
Let F^j dénoté the second-order partial dérivative of F in formula (7.41) with 
respect to x^, s 
matrix 


Bi = 


d 

fj = 

1,2,. 

n' 

. . , /t. 

i # j. Consider the ( 

m + 

^11 

Fn 

• • • 

Fin 


^2^^ • • ■ 

a(i) 

om 

. • 

F22 

» 

• • • 

F 2 n 

» 

» 

# - 

» 

a ( 2 ) 

Om 

» 

■ ■ 

» 

P ni 

• • • 

• 

F 

fin 


• 

» 

- 

» 

gin) 

om 



• • • 

gi"^ 

0 

0 

0 

» 

g? 

» 

• • • 

gi"^ 

» 

0 

• 

0 

• 

0 

• 

• 

» 

a(i) 
0 m 

• 

» 

a( 2 ) 

om 

• • • 

» 

gin) 

0 m 

• 

0 

» 

0 

» 

0 


(7.53) 



292 


MULTIDIMENSIONAL CALCULUS 


where = âgj / / = 1 , 2, . . . , n; 7 = 1, 2, . . . , m. Let dénoté the déter- 
minant of Furthermore, let A2, A3, . . . , dénoté a set of principal 
minors of (see Définition 2.3.6), namely, the déterminants of the principal 
submatrices B2, B3, . . . , B„_^ , where B- is obtained by deleting the first i — 1 
rows and the first i — 1 columns of B^ (i = 2 , 3 , . . . ,n — m). Ail the partial 
dérivatives used in B^, B2, . . . , B„_^ are evaluated at Xq. Then sufficient 
conditions for Xq to be a point of local minimum of / are the following: 


i. If m is even. 


Ai>0, 


ii. If m is odd. 


Ai<0, 


A2>0,..., 


A2 <0,..., 


>0. 

< 0 . 


However, sufficient conditions for Xq to be a point of local maximum are the 
following: 

i. If n is even, 

Ai>0, A2<0,..., (-!)"■'” A„_„ <0. 

ii. If n is odd, 

Ai<0, A2>0 ,..., (-!)"■'” A„_„ >0. 

Example 7.8.1. Let us find the minimum and maximum distances from 
the origin to the curve determined by the intersection of the plane X2 +X3 = 0 
with the ellipsoid xj + 2x| +x| + 2x2X3 = 1. Let /(x^, X2, X3) be the squared 
distance function from the origin, that is, 

/(x^, X 2 , X 3 ) =x^ +x| +x|. 

The equality constraints are 

gl(Xi,X2,X3) =X2+X3 = 0 , 

^2(^1, ^2^ ^3) + 2^2^3 “1=0- 

Then 


F(Xi, X 2 , X 3 ) =xf +xj +xj + Ai(X 2 +X 3 ) 

+ A2(x^ + 2x| +x| + 2X2X3 — 1), 
âF 
âX-^ 


= 2 x^ + 2A2X1 = 0 , 


(7.54) 



THE RIEMANN INTEGRAL OF A MULTIVARIABLE FUNCTION 


293 


dF 

= 2%2 + + 2À2(2x2 +^ 3 ) = 0, (7.55) 

d%2 

dF 

= 2 x 3 + + 2A2(%2 +X 3 ) = 0. (7.56) 

dX2 


Equations (7.54), (7.55), and (7.56) and the equality constraints are satisfied 
by the following sets of solutions: 


I. 

IL 

III. 

IV. 


Xi = 
Xi = 


Xi = 


Xi = 


0, %2 = 1, X3 = — 1, A^ = 2, A2 = — 2. 

0, ^2= — 1, X3 = 1, A^ = — 2, A2 = — 2. 

1, X2 = 0, X3 = 0, A^ = 0, A2 = — 1. 

— 1, X2 = 0, X3 = 0, A^ = 0, A2 = — 1. 


To détermine if any of these four sets correspond to local maxima or 
minima, we need to examine the values of A^, A2, . . . , A„_^ . Here, the matrix 
in formula (7.53) has the value 


Bi = 


2 + 2 A' 

J 

0 

0 

0 

2xi 


0 

2 + 4A2 

2 A2 
1 

4X2 + 2^3 


0 
2A. 

2 + 2 A 


2 


1 


2xo + 2x- 


0 2x^ 

1 4^2 + 2X3 

1 2X2 + 2X3 

0 0 

0 0 


Since n = 3 and m =2, only one A^, namely, A^, the déterminant of B^, is 
needed. Furthermore, since m is even and n is odd, a sufficient condition for 
a local minimum is A^ > 0, and for a local maximum the condition is A^ < 0. 
It can be verified that A^ = — 8 for solution sets I and II, and A^ = 8 for 
solution sets III and IV. We therefore hâve local maxima at the points 
(0, 1,— I) and (0,— I, I) with a common maximum value = 2. We also 
hâve local minima at the points (1,0,0) and (—1,0,0) with a common 
minimum value = 1. Since these are the only local optima on the curve of 
intersection, we conclude that the minimum distance from the origin to this 
curve is 1 and the maximum distance is ^/2 . 


7.9. THE RIEMANN INTEGRAL OF A MULTIVARIABLE FUNCTION 

In Chapter 6 we discussed the Riemann intégral of a real-valued function of 
a single variable x. In this section we extend the concept of Riemann 
intégration to real-valued functions of n variables, x^, X2 , . . . , x„. 



294 


MULTIDIMENSIONAL CALCULUS 


Définition 7.9.1, The set of points in whose coordinates satisfy the 
inequalities 

a^<Xi<b^, / = 1, 2, . . . , 7î, (^-57) 

where / = 1, 2, . . . , n, form an n-dimensional cell denoted by b). 

The content (or volume) of this cell is n*l=i(b^ — af and is denoted by 
fj[cJ.a,b)l 

Suppose that is a partition of the interval [a-, b-], i= 1, 2, The 

Cartesian product P= P- is a partition of cj^a,b) and consists of 
7î-dimensional subcells of cj^a, b). We dénoté these subcells by 5^, ^ 2 , . . . , 5^. 
The content of is denoted by / = 1,2, . . . , r», where v is the number 

of subcells. □ 

We shall first define the Riemann intégral of a real-valued fonction /(x) 
on an n-dimensional cell; then we shall extend this définition to any bounded 
région in P”. 


7.9.1. The Riemann Intégral on Cells 

Let /: D where D cP". Suppose that c„(a, b) is an n-dimensional cell 
contained in D and that / is bounded on cj^a, b). Let P be a partition of 
c„{a,b) consisting of the subcells ^ 2 , . . . , 5^. Let and be, respec- 
tively, the infimum and supremum of / on i = 1,2, . . . , v. Consider the 
sums 

L5^(/)= (7.58) 

i = l 

USp{f)= (7.59) 

i = l 

We note the similarity of these sums to the ones defined in Section 6.2. As 
before, we refer to LSp(f) and USpif) as the lower and upper sums, 
respectively, of / with respect to the partition P. 

The following theorem is an /r-dimensional analogue of Theorem 6.2.1. 
The proof is left to the reader. 

Theorem 7.9.1. Let /: D ^R, where D cP”. Suppose that / is bounded 
on cj^a, b) cD. Then / is Riemann intégrable on cj^a, b) if and only if for 
every e > 0 there exists a partition P of cjia, b) such that 

USpif)-LSpif)<€. 

Définition 7.9.2. Let P^ and P 2 be two partitions of cfia, b). Then P 2 is 
a refinement of P^ if every point in P^ is also a point in P 2 , that is, P^ c P 2 . 

□ 



THE RIEMANN INTEGRAL OF A MULTIVARIABLE FUNCTION 


295 


Using this définition, it is possible to prove results similar to those of 
Lemmas 6.2.1 and 6.2.2. In particular, we hâve the following lemma: 


Lemma 7.9.1. Let /: D where D Suppose that / is bounded 
on c„(a, b) cZ). Then sup^ LS pif) and infp US pif) exist, and 

supL5^(/) < miUSpif). 

P P 

Définition 7.9.3. Let /: cj^a,b) ^ R be a bounded function. Then / is 
Riemann intégrable on b) if and only if 

supL5^(/) = infUSpif). (7.60) 

P P 

Their common value is called the Riemann intégral of / on b) and is 
denoted by f i^.fix)dx. This is équivalent to the expression 
faffi^v ^2^ • • • ^ ^n) ^^^r example, for n = 2, 3 we hâve 



X 2 ) dxi dx2, 


%i, X 2 ^ "^3) dx2 dx 2 • 


(7.61) 

(7.62) 


The intégral in formula (7.61) is called a double Riemann intégral, and the 
one in formula (7.62) is called a triple Riemann intégral. In general, for 
^ 2, iy^fix)dx is called an n-tuple Riemann intégral. □ 


The intégral ^^-^fix)dx has properties similar to those of a single-varia- 
ble Riemann intégral in Section 6.4. The following theorem is an extension of 
Theorem 6.3.1. 


Theorem 7.9.2. If / is continuons on an n-dimensional cell cj^a, b), then 
it is Riemann intégrable there. 


7.9.2. Iterated Riemann Intégrais on Cells 

The définition of the n-tuple Riemann intégral in Section 7.9.1 does not 
provide a practicable way to evaluate it. We now show that the évaluation of 
this intégral can be obtained by performing n Riemann intégrais each of 
which is carried out with respect to one variable. Let us first consider the 
double intégral as in formula (7.61). 

Lemma 7.9.2. Suppose that / is real-valued and continuons on C 2 ia, b). 
Define the function ^(^2) as 

g(X2) = Uf{x-^,X2)dx^. 

Then ^(^2) is continuons on [«2, ^2!- 



296 


MULTIDIMENSIONAL CALCULUS 


Proof Let 6>0 be given. Since / is continuous on C 2 (a,b), which is 
closed and bounded, then by Theorem 7.3.2, / is uniformly continuous on 
C 2 ia, b). We can therefore find a ô > 0 such that 

1/(0 -/(ii)l < J—— 

L/ 1 Ci 

if 11^ - mil < 5, where g = (xj, X 2 )', m = (ji, y 2 )', and Xj, y^ e [a^, X 2 , ^2 

e [« 2 , ^ 2 ]- follows that if 1^2 “^ 2 ! ^ then 


U(y 2 ) -^(^2)1 = 


i\f{Xl,y2) -f{Xi,X2)] dx_ 


^ C'\f{Xl,y2) -f{x^,X2)\dx^ 


< 


r^dx„ 

J ûi b^ 


(7.63) 


since IKx^, 3 ^ 2 )' “ ^ 2 )'!! = 1^2 “^ 2 ! < From inequality (7.63) we con- 

clude that 


\giy 2 ) -g{x2)\< e 

if 1^2 “^ 2 ! ^ Hence, ^(^ 2 ) is continuous on [« 2 ,^ 2 !* Consequently, from 
Theorem 6.3.1, ^(^ 2 ) is Riemann intégrable on [^ 2 ,^ 2 ^ that is, ja^gix 2 ) dx 2 
exists. We call the intégral 



Xi, X 


) dx 


dx 


2 


(7.64) 


an iterated intégral of order 2. □ 

The next theorem States that the iterated intégral (7.64) is equal to the 
double intégral fc,(a,b)f(x)dx. 

Theorem 7.9.3. If / is continuous on C 2 (a, b), then 



Xi, X 


) dx 



Proof. Exercise 7.22. □ 


We note that the iterated intégral in (7.64) was obtained by integrating 
first with respect to then with respect to X 2 . This order of intégration 



THE RIEMANN INTEGRAL OF A MULTIVARIABLE FUNCTION 


297 


could hâve been reversed, that is, we could hâve integrated / with respect to 
X 2 and then with respect to The resuit would be the same in both cases. 
This is based on the following theorem due to Guido Fubini (1879-1943). 


Theorem 7.9.4 (Fubini’s Theorem). If / is continuous on b), then 


f /(x) dx = f 

'^C2ia,b) ''a 


f ^f(x^,X2) dx^ dX2 = f ^ f ^{x^,X2) dX2 


dx^ 


Proof. See Corwin and Szczarba (1982, page 287). 


□ 


A generalization of this theorem to multiple intégrais of order n is given 
by the next theorem [see Corwin and Szczarba (1982, Section 11.1)]. 


Theorem 7.9.5 (Generalized Fubini’s Theorem). If / is continuous on 
c„(a, b) = [x\a^ <Xi < èp / = 1,2, . . . , n}, then 


f /(x) dx = f r‘/(x) dXi 


, i — 1,2,..., /Z, 


where dx^^^ = dxidx 2 '" dxi_^dx^^^ dx^ and is an {n — 1)- 

dimensional cell such that a^<Xi< b^, «2 < X 2 < ^ 2 ? • • • ? ^ 

^i + l 


7.9.3. Intégration over General Sets 

We now consider n-tuple Riemann intégration over régions in R" that are 
not necessarily cell shaped as in Section 7.9.1. 

Let /: D ^ R be a bounded and continuous function, where Z> is a 
bounded région in R”. There exists an n-dimensional cell c„(a, b) such that 
D c c„(a, b). Let g: b) ^ R be defined as 


Then 



X eZ), 
x^D. 



(7.65) 


The intégral on the right-hand side of (7.65) is independent of the choice of 
c„(a, b) provided that it contains D. It should be noted that the function g(x) 
may not be continuous on Br(D), the boundary of D. This, however, should 
not affect the existence of the intégral on the left-hand side of (7.65). The 
reason for this is given in Theorem 7.9.7. First, we need to define the 
so-called Jordan content of a set. 



298 


MULTIDIMENSIONAL CALCULUS 


Définition 7.9.4. Let D be a bounded set such that D cc„(a, b) for 
some 7î-dimensional cell. Let the function A^: be defined as 



1, xeZ), 

0, x^Z). 


This is called the characteristic function of D. Suppose that 

snp LSp(Xj^) = intZAS^fA^), (7.66) 

P P 

where LSpiXp^) and USpikp,) are, respectively, the lower and upper sums of 
A^(x) with respect to a partition P of cj^a, h). Then, D is said to hâve an 
/î-dimensional Jordan content denoted by /Xy(Z)), where /x^(-D) is equal to the 
common value of the terms in equality (7.66). In this case, D is said to be 
Jordan measurable. □ 

The proofs of the next two theorems can be found in Sagan (1974, Chapter 

11 ). 

Theorem 7.9.6. A bounded set D <zR^ is Jordan measurable if and only 
if its boundary Br(D) has a Jordan content equal to zéro. 

Theorem 7.9.7. Let /: D^R, where DciZ" is bounded and Jordan 
measurable. If / is bounded and continuons in D except on a set that has a 
Jordan content equal to zéro, then fj^f(x)dx exists. 

It follows from Theorems 7.9.6 and 7.9.7 that the intégral in equality (7.75) 
must exist even though g(x) may not be continuons on the boundary Br{D) 
of Z>, since Br{D) has a Jordan content equal to zéro. 

Example 7.9.1. Let /(x^ X 2 ) = x^X 2 and D be the région 

Z) = {(x^, X 2 ) |xi +x| < 1, Xi > 0, X 2 > 0 }. 

It is easy to see that D is contained inside the two-dimensional cell 

€ 2 ( 0 , 1 ) = ((xi, X 2) |0 <Xi < 1 , 0 <X 2 < 1 }. 


I f X1X2 dx^ dx 2 

J J J) 


fV 

•'O •'O 




X1X2 dx 2 



Then 



THE RIEMANN INTEGRAL OF A MULTIVARIABLE FUNCTION 


299 


We note that for a fixed in [0, 1], the part of the line through (x^ 0) that 
lies inside D and is parallel to the X 2 -axis is in fact the interval 0<X2< 
(1— For this reason, the limits of X 2 are 0 and (1— Conse- 
quently, 


iV 

•'O •'O 


(l-X 2)1/2 


X 1 X 2 dX2 




1 

8 • 


In practice, it is not always necessary to make reference to c„(a, b) that 
encloses D in order to evaluate the intégral on D. Rather, we only need to 
recognize that the limits of intégration in the iterated Riemann intégral 
dépend in general on variables that hâve not yet been integrated ont, as was 
seen in Example 7.9.1. Care should therefore be exercised in correctly 
identifying the limits of intégration. By changing the order of intégration 
(according to Fubini’s theorem), it is possible to facilitate the évaluation of 
the intégral. 


2 

Example 7.9.2. Consider / dxidx 2 , where D is the région in the 
first quadrant bounded by X 2 = 1 and x^ =X 2 * In this example, it is easier to 
integrate first with respect to x^ and then with respect to X 2 . Thus 


( f e^"dx^dx2= f 

J Jj) ^0 •'O 


^2 2 

e^^dx 


1 


= ( X2^^2 dx 

h 

= ï{e-l). 


2 



Example 7.9.3. Consider the intégral / j^ixl +x\)dxi dx 2 , where Z) is a 
région in the first quadrant bounded by X 2 =xj and x^ =X 2 * Hence, 



xl +X 2 ) dx^ dx2 



959 

4680 • 


7.9.4. Change of Variables in «-Tuple Riemann Intégrais 

In this section we give an extension of the change of variables formula in 
Section 6.4.1 to n-tuple Riemann intégrais. 



300 


MULTIDIMENSIONAL CALCULUS 


Theorem 7.9.8. Suppose that Z> is a closed and bounded set in Let 
/: D^R be continuous. Suppose that h: is a one-to-one function 

with continuous first-order partial dérivatives such that the Jacobian détermi- 
nant, 


det[Jh(x); 



? 


is different from zéro for ail x in Z), where x = (x^, % 2 , . . . , x„)' and hi is the 
ith element of h(/ = 1, 2, . . . , n). Then 


jj{x) dx = j^ f[g(u)]\detj^(u)\du, 


(7.67) 


where D' = h(Z)), u = h(x), g is the inverse function of h, and 


detjg(u) = 


^2’ * * * ’ §n^ 
^2 î * * * î 


(7.68) 


where and u - are, respectively, the ith éléments of g and u (/ = 1, 2, . . . , n). 


Proof See, for example, Corwin and Szczarba (1982, Theorem 6.2), or 
Sagan (1974, Theorem 115.1). □ 

Example 7.9.4. Consider the intégral j jjyx^x\dx^dx 2 , where D is 
bounded by the four parabolas, x\ =x^, x| = 3x^ xf =^ 2 , xj = 4x2* Let = 
x|/Xi, U 2 =xl/x 2 ‘ The inverse transformation is given by 

X2={u\u2Ÿ^. 

From formula (7.68) we hâve 

d{gl,g2) ^ d{x^,X2) ^ 
d{u^,U2) â(u^,U2) 3 

By applying formula (7.67) we obtain 


X 


i = ( 


u^u 



where is a rectangular région in the u^U 2 space bounded by the lines 
= 1, 3; U 2 = 1, 4. Hence, 






DIFFERENTIATION UNDER THE INTEGRAL SIGN 


301 


7.10. DIFFERENTIATION UNDER THE INTEGRAL SIGN 

Suppose that /(x^, %2, . . . , x„) is a real-valued function defined on D cR”. If 
some of the x/s, for example, x^ + ^, x^+2? • • • , x„ (n > m), are integrated out, 
we obtain a function that dépends only on the remaining variables. In this 
section we discuss conditions under which the latter function is différen- 
tiable. For simplicity, we shall only consider fonctions oi n = 2 variables. 

Theorem 7.10.1. Let /: D ^ R, where D<zR^ contains the two-dimen- 
sional cell C2 (^ï, ^) = {(x^, X2)ki <x^ < ^2 <X2 < ^2^- Suppose that / is 

continuous and has a continuons first-order partial dérivative with respect to 
X2 in D. Then, for «2 <X2 < ^2^ 


d 

dx2 





df{Xi,X2) 

d%2 



Proof Let h{x2) be defined on [ü2, ^2] 



df{Xl,X2) 

dX2 



Ü2<X2<b2 


( 7 . 69 ) 


Since df/ 8X2 is continuous, then by Lemma 7 . 9 . 2 , h{x2) is continuous on 
[«2,^2]- Now, let t be such that Ü2<t<b2. By integrating h{x2) over the 
interval [«2? we obtain 



df{Xi,X2) 

3 X 2 


dx^ 


dx 


( 7 . 70 ) 


The order of intégration in ( 7 . 70 ) can be reversed by Theorem 7 . 9 . 4 . We than 
hâve 


fh{X2)dX2 = P 


çt df{x^,X2) 

I dx 


dx- 


dx^ 


= P[f(xi,t) -f{xt,a2)] dx^ 


= Pf{xi,t) dx^ - r^fixi, « 2 ) ^1 


= F{t)-F{a2), 


( 7 . 71 ) 



302 


MULTIDIMENSIONAL CALCULUS 


where F(y) = y) dxi. If we now apply Theorem 6.4.8 and differenti- 

ate the two sides of (7.71) with respect to t we obtain h(t) =F'(0, that is, 



dt 




(7.72) 


Formula (7.69) now follows from formula (7.72) on replacing t with X 2 * □ 


Theorem 7.10.2. Let / and D be the same as in Theorem 7.10.1. 
Furthermore, let A(% 2 ) and ^(^ 2 ) be functions defined and having continu- 
ons dérivatives on [« 25 ^ 2 ^ such that < A(^ 2 ) < ^(^ 2 ) < for ail X 2 in 

[« 2 , ^ 2 ^* Then the function G: [« 2 , ^ 2 ] ^ defined by 


is différentiable for a 2 <X2< b 2 , and 

d-Cj r f)( Y d f ( X -\ • Ai^t) 

— = f dx^ + 0'(X2)f[9(X2),X2] - A'(X2)/[A(X2),X2]. 

0 X 2 ■’xix2) 9^2 

Proof. Let us write G(% 2 ) as //(A, 0, X 2 ). Since both of A and 0 dépend on 
X 2 , then by applying the total dérivative formula [see formula (7.12)] to H we 
obtain 


dH dH dX dH do dH 

dx2 dX dx2 dO dx2 dX2 


Now, by Theorem 6.4.8, 

dH 
de ~ 
dH 
dX 



Furthermore, by Theorem 7.10.1, 


dH 

dX2 




3f{x^,X2) 

dX2 



(7.73) 



DIFFERENTIATION UNDER THE INTEGRAL SIGN 


303 


By making the proper substitution in formula (7.73) we finally conclude that 


d 


dx 


/ f(Xi,X2)dXi = 


â(xy) df(x^,X2) 


dx 


+ S'{X 2 )f[e{X 2 ),X 2 ] 


2 




□ 


Example 7.10.1. 


d 

dx2 



Xjxf 



e dxi 


— sin X 2 (x| cos ^2 — 1) e 

— 2 x 2 ( x 2 — 1 ) e~^K 


Theorems 7.10.1 and 7.10.2 can be used to evaluate certain intégrais of the 
form f^f(x)dx. For example, consider the intégral 



Define the function 


F(x 2)= j cos(x^X2) dx^, 


where x^>l. Then 


1 

^(^ 2 ) = — sin(xiX2) 
X 


Xi=TT 


Xi=0 ^2 


1 

= sin( 7 TX 2 ). 


If we now differentiate F(x 2 ) two times, we obtain 


2sin(77X2) — 277 X 2 COS(t 7X2) — '77 ^x| sin(77X2) 


F"{X2)= 


X 


2 


Thus 


.TT 2t7X2 COs(t7X2) + 'TT^xf sin(77X2) “ 2sin(77X2) 

/ X? cos( XiXol dx, = ^ . 



304 


MULTIDIMENSIONAL CALCULUS 


By replacing X 2 with 1 we obtain 



7.11. APPLICATIONS IN STATISTICS 

Multidimensional calculas provides a theoretical framework for the study of 
multivariate distributions, that is, joint distributions of several random vari- 
ables. It can also be used to estimate the parameters of a statistical model. 
We now provide details of some of these applications. 

Let X = (X^, X 2 , . . . , X„)' be a random vector. The distribution of X is 
characterized by its cumulative distribution fonction, namely, 

F(x) =P(Xi <Xi,X 2 <X 2 ,...,X„ <xj, (7.74) 

where x = X 2 , . . . , x„)'. If F(x) is continuons and has an nth-order mixed 

partial dérivative with respect to x^, X2, . . . , x„, then the fonction 

d^F{x 

dX-^ 8X2 ••• 8x^ ’ 

is called the density fonction of X. In this case, formula (7.74) can be written 
in the form 



/ •^i r^n 

\ ■■■ f f{^)dz. 

— CO*^ — 00 — 00 


where z = (z^ 2 : 2 ^ • • • ^ ‘ random variable X- (/ = 1, 2, . . . , n) is con- 

sidered separately, then its distribution fonction is called the ith marginal 
distribution of X. Its density fonction //(x^), called the ith marginal density 
fonction, can be obtained by integrating out the remaining n — 1 variables 
from /(x). For example, if X = (X^, X 2 )', then the marginal density fonction 
of X^ is 


f{x^,X2)dX2. 
— 00 


Similarly, the marginal density fonction of X 2 is 



APPLICATIONS IN STATISTICS 


305 


In particular, if are independent random variables, then the 

density function of X = (X^, X2 , . . . , X^)' is the product of ail the associated 
marginal density functions, that is, /(x) = n”=i//(x^). 

If only n — 2 variables are integrated ont from /(x), we obtain the 
so-called bivariate density function of the remaining two variables. For 
example, if X = (X^, X2, X3, X4)', the bivariate density function of and 
X2 is 


.00 ^00 


fl2{Xi,X2)=[ f f{Xi,X2,X2,X^) dxjdx^. 

— CO *^ — CO 


Now, the mean of X = (X^, X2 , . . . , X^Y is |x = ( /x^, /X2, . . . , where 


/x-=/ xJi(Xi)dXi, /= 1,2,...,7î. 

— CO 

The variance -covariance matrix of X is the n Xn matrix X = (u-.p, where 

-CO CO 

= / / (Xi- ^lJ) fij {Xi,Xj) dXi dxj , 

—CO'' —00 

where /x^ and fij are the means of X^ and Xj, respectively, and fijix^, Xj) is 
the bivariate density function of X^ and Xj, i # 7. If i = 7, then is the 
variance of X-, where 



1 = 1 , 2 ,.. 


n. 


7.11.1. Transformations of Random Vectors 

In this section we consider a multivariate extension of formula (6.73) regard- 
ing the density function of a function of a single random variable. This is 
given in the next theorem. 

Theorem 7.11.1. Let X be a random vector with a continuons density 
function /(x). Let g: where D is an open subset of R” such that 

P(X^D) = 1. Suppose that g satisfies the conditions of the inverse function 
theorem (Theorem 7.6.1), namely the following: 


i. g has continuons first-order partial dérivatives in D. 

ii. The Jacobian matrix J (x) is nonsingular in D, that is, 

& 


det[jg(x) 





¥=0 


for ail X e Z>, where is the ith element of g (/ = 1, 2, . . . , n). 



306 


MULTIDIMENSIONAL CALCULUS 


Then the density function of Y = g(X) is given by 

h{y) =/[g“^(y)]|det[jg-i(y)]|, 

where g” Ms the inverse function of g. 

Proof By Theorem 7.6.1, the inverse function of g exists. Let us therefore 
Write X = g~^(Y). Now, the cumulative distribution function of Y is 

^^(y) =-P[^i(x) <yi, g2(x) <y2>---> ^«(x) <y„] 

= f /(x) dx, (7.75) 

J A 


where = {x e Z)|g-(x) <y^, / = 1, 2, . . . , n}. If we make the change of vari- 
able w= g(x) in formula (7.75), then, by applying Theorem 7.9.8 with g“Hw) 
used instead of g(u), we obtain 



where = g(Aj = {g(x)|gXx) <y/, / = 1, 2, . . . , n}. Thus 


r r f\-^ i(w)]|det[jg-i(w)]i^/w. 
— 00*^— 00 *^—00 


It follows that the density function of Y is 


^(y)=/[g Hy)] 


^(c?l > c?2 )•••).?« ) 

d{yi,yi,---,yn) 


where g, ^ is the ith element of g ^ (i = 1,2, ,,,, n). 


-1 


□ 


(7.76) 


Example 7.11.1. Let X = (X^,X 2 )\ where X-^^ and X 2 are independent 
random variables that hâve the standard normal distribution. Here, the 
density function of X is the product of the density functions of X^ and X 2 . 
Thus 



-00 <x^, X2 < 00. 


Y2=X,-2X2. 


Let Y = (y^, 1 ^ 2 )' be defined as 



APPLICATIONS IN STATISTICS 


307 


In this case, the set D in Theorem 7.11.1 is ^i(x) =x^ +X 2 , ^ 2 ^^) ^^1 “ 
2 ^ 2 > ^rHy) =Xi = f(2yi +y 2 l gï^(j') =^2 = K^i -y 2 X and 


^(gl Xg2^) 

^(yi>y2) 




1 

3 


Hence, by formula (7.76), the density function of y is 


1 

h{y) = — exp 

ZTT 


1 


6tt 


exp 


1 /2yi+y2^ 
2 

1 


2 


1 

X - 
3 




-00<yi, y^< 


00 


Example 7.11.2. Suppose that it is desired to détermine the density 
function of the random variable K = + X 2 , where >0, X 2 > 0, and 

X = (XpX 2 )' has a continuons density function /(x^X 2 ). This can be 
accomplished in two ways: 


i. Let Q(v) dénoté the cumulative distribution function of V and let q{v) 
be its density function. Then 


Q{v)=P{X,+X^<v) 



where A = {(x^, X 2 )\x-^ >0, X 2 > 0, +X 2 < l’}. We can write Q{v) as 



If we now apply Theorem 7.10.2, we obtain 



(7.77) 


ii. Consider the following transformation: 

yi=Xi+X2, 



308 


MULTIDIMENSIONAL CALCULUS 


Then 


X, = Y,-Y„ 

^2 = Y2- 

By Theorem 7.11.1, the density function of Y = (y^y 2 )' is 


Hyi’y2) =f(yi -y2^y2) 


=f{yi -y2,y2) 


'^{Xi,X 2 ) 

^iyi’y2) 


det 


1 -1 

0 1 


=f{yi --V2>-V2)> 3^1 ^ 3 ^ 2 ^ 0 - 

By integrating 3^2 out we obtain the marginal density function of 
y^ = V, namely, 

.V çV 

= / f{yi~y2X2) dy2 = / f{v-X2,X2) dX2. 

•'O •'0 


This is identical to the density function given in formula (7.77). 


7.11.2. Maximum Likelihood Estimation 

Let X 2 , . . . , be a sample of size n from a population whose distribu- 
tion dépends on a set of p parameters, namely Oi, ^ 2 ? • • • ? We can regard 
this sample as forming a random vector X = (X^, X 2 , . . . , X„)'. Suppose that 
X has the density function /(x, 0), where x = % 2 , . . . , x„)' and 0 = 

iOi, ^ 2 ? • • • ? %y ‘ This density function is usually referred to as the likelihood 
function of X; we dénoté it by L(x, 0). 

A 

For a given sample, the maximum likelihood estimate of 0, denoted by 0, 
is the value of 0 that maximizes L(x, 0). If L(x, 0) has partial dérivatives with 
respect to 0 ^, 62 , . . . , 0 ^, then 0 is often obtained by solving the équations 

dL(x, 0) 

=0, / = 1, 2, . . . , /?. 


In most situations, it is more convenient to work with the natural logarithm 
of L(x, 0); its maxima are attained at the same points as those of L(x, 0). 
Thus 0 satisfies the équation 


d log L(x, 0) 




î' 1 , 2 , . . . , . 


(7.78) 


Equations (7.78) are known as the likelihood équations. 



APPLICATIONS IN STATISTICS 


309 


Example 7.11.3. Suppose that X^,X2,...,X„ form a sample of size n 
from a normal distribution with an unknown mean /x and a variance o-^. 
Here, 0 = ( /x, and the likelihood function is given by 


L(x,0) 


1 


(2770-'^) 




1 . 
2o- 


Let L*(x, 0) = log L(x, 0). Then 

L*(x,e) = i) {Xi- llf - -\og{ 2 TT(T^) . 


The likelihood équations in formula (7.78) are of the form 


dL^ 

âfJi 

âL^ 


a 


1 « 

L A) =0 


1 


i = l 


n 


n 


dd 2 a 


^ 4 


L - A) - = 0. 


/ = ! 


2â 


Equations (7.79) and (7.80) can be written as 


n{x — /x) = 0, 


n 


-nâ^ = 0, 

/ = ! 


(7.79) 

(7.80) 


(7.81) 

(7.82) 


where x = (l//r)E"=iXj. If n>2, then équations (7.81) and (7.82) hâve the 
solution 


ii=x, 

(T^=- é (Xi~xf- 

^ i = l 

These are the maximum likelihood estimâtes of /x and a^, respectively. 

It can be verified that /x and are indeed the values of fx and a^ that 
maximize L"^(x, 0 ). To show this, let us consider the Hessian matrix A of 
second-order partial dérivatives of L* (see formula 7.34), 

dfJL da^ 

da"^ 


A = 


dfj? 

dix da^ 



310 


MULTIDIMENSIONAL CALCULUS 


Hence, for fJi = and 


djj} 

d^L* 
â/Ji dcr^ 

d^L* 


n 



î 


1 


n 


L A) =0, 


1 = 1 




Thus /dfj} < 0 and det(A) =n^/2â^ > 0. Therefore, by Corollary 7.7.1, 
(fl, a is a point of local maximum of L*. Since it is the only maximum, it 
must also be the absolute maximum. 


Maximum likelihood estimators hâve interesting asymptotic properties. 
For more information on these properties, see, for example, Bickel and 
Doksum (1977, Section 4.4). 


7.11.3. Comparison of Two Unbiased Estimators 

Let and X 2 be two unbiased estimators of a parameter fi. Suppose that 
X = (X^, X 2 )' has the density function f(x^, X 2 ), —^<x^, X 2 < To com- 
pare these estimators, we may consider the probability that one estimator, for 
example, X^, is doser to /jl than the other, X 2 , that is, 

p=P[\X,-fi\ <1X2- fi\]. 

This probability can be expressed as 



(7.83) 


where D = {(x^, X 2 )\ — /x| < \x 2 — fi\}. Let us now make the following 

change of variables using polar coordinates: 

Xi~ fl = r cos 0, X 2 — fl = r sin 0 . 


By applying formula (7.67), the intégral in (7.83) can be written as 



^{Xi,X2) 

d{r, 0) 


drdO 


= f f S{r, 9)rdrde, 

J 



APPLICATIONS IN STATISTICS 


311 


where g(r, 0) =/( /x + r cos /x + r sin 0) and 


TT 


3tt Stt Itt 

D'((r,«)|0<r<» <»< <«< 


In particular, if X has the bivariate normal density, then 


/(Xi,X2) = 


1 


and 


2770-10-2(1 


1 


X exp < — 


2(1 -P^) 


(Xi-/x) 2p{x^~ ^l){x2- il) 


0-1 




+ 


(^2-P-) 


a 


2 

2 


-00 <^i, ^2 < 00, 


g{r,0) = 


1 


2770-10-2(1 


.2 


X exp < — 


2(1 -P^) 


cos^ 0 


a 


2 


2 P cos 0 sin ^ sin"^ ^ 
+ - 


0 - 1 0-2 


(T 


2 


where and 0 - 2 ^ are the variances of and X 2 , respectively, and p is 
their corrélation coefficient. In this case, 


p = j g(r,e)rdr 

•'tt/4 L-'O 


de 


It can be shown (see Lowerre, 1983) that 


1 

P = 1 Arctan 

TT 


20 - 10 - 2(1 -p^) 


2x1/2 


0-2 0-1 


(7.84) 


if 0-2 > o-i* A large value of p indicates that Xi is doser to p than X 2 , which 
means that Xi is a better estimator of p than X 2 . 


7.11.4. Best Linear Unbiased Estimation 

Let Xi, X 2 ,...,X„ be independent and identically distributed random vari- 
ables with a common mean p and a common variance a^. An estimator of 



312 


MULTIDIMENSIONAL CALCULUS 


the form where the a/s are constants, is said to be a linear 

estimator of /x. This estimator is unbiased if Eicj)) = /x, that is, if = 1, 

since EiX^) = /jl for / = 1, 2, . . . , n. The variance of (/> is given by 

Var{$) = a^Zcif. 

i = l 

A A 

The smaller the variance of cf), the more efficient (j) is as an estimator of /x. 

A 

In particular, if are chosen so that Var((/)) attains a minimum 

value, then cf) will hâve the smallest variance among ail unbiased linear 

A 

estimators of /x. In this case, cf) is called the best linear unbiased estimator 
(BLUE) of /X. 

Thus to find the BLUE of /x we need to minimize the function /= 'L^=iaj 
subject to the constraint = 1. This minimization problem can be solved 

using the method of Lagrange multipliers. Let us therefore write F [see 
formula (7.41)] as 


n 


F= + À. 


i = l 


n 


E - 1 


/ = 1 


âF 

— = 2a, + A = 0, / = 1,2, . . . , n. 

âa. ' 


Hence, a^ = — A/2 (/ = 1, 2, . . . , n). Using the constraint = 1, we con- 

clude that X= —2/n. Thus a-=l/n, z = 1,2, . . . , n. To verify that this 
solution minimizes /, we need to consider the signs of A^, A 2 , . . . , A„_^, 
where A^ is the déterminant of B, (see Section 7.8). Here, is an (n + 1) X 
(n + 1) matrix of the form 


It follows that 






n2"^ 

Aj = det(Bi) = - — <0, 



(n-l)2"-i 

2 



A 


n — \ 


- 2 ^< 0 . 


Since the number of constraints, m = 1, is odd, then by the sufficient 



APPLICATIONS IN STATISTICS 


313 


conditions described in Section 7.8 we must hâve a local minimum when 
= 1 /tî, / = 1, 2, . . . , 7 î. Since this is the only local minimum in it must be 
the absolute minimum. Note that for such values of « 2 , . . . , is the 
sample mean We conclude that the sample mean is the most efficient (in 
terms of variance) unbiased linear estimator of /x. 


7.11.5. Optimal Choice of Sample Sizes in Stratified Sampling 

In stratified sampling, a finite population of N units is divided into r 
subpopulations, called strata, of sizes ^ 2 ? • • • ? Frorn each stratum a 
random sample is drawn, and the drawn samples are obtained independently 
in the different strata. Let n- be the size of the sample drawn from the ith 
stratum (/ = 1,2, . . . , r). Let y^j dénoté the response value obtained from the 
jth unit within the ith stratum (/ = 1, 2, . . . , r; 7 = 1, 2, . . . , n^). The population 
mean Y is 


A; 


7-^É Er„- 

i=l j=l 



where Yj is the true mean for the ith stratum (i = 1,2, ...,r). A stratified 
estimate of Y is (st for stratified), where 

1 ^ 

yst = ^ L ^iÿi, 

i = l 


in which = (l/?r/)E”Liy/y the mean of the sample from the ith stratum 
(i= 1, 2, ... , r). If, in every stratum, is unbiased for Yj, then is an 
unbiased estimator of Y. The variance of is 

Var(J^st) = ^ è Var(3),.) . 

i = l 


Since is the mean of a random sample from a finite population, then its 
variance is given by (see Cochran, 1963, page 22) 

Var( 3 ;,.) = — ( 1 i = l,2,...,r, 
ri: 


where = n^/N^, and 



1 



2 



314 


MULTIDIMENSIONAL CALCULUS 


Hence, 

Var(3^,t) = E 

i=l 

where L ■ = N^/N (/ = 1, 2, . . . , r). 

The sample sizes can be chosen by the sampler in an optimal 

way, the optimality criterion being the minimization of VarCÿ^^) for a speci- 
fied cost of taking the samples. Here, the cost is defined by the formula 

r 

COSt = Co+ 

i = l 

where c- is the cost per unit in the ith stratum (i= 1,2,..., r) and Cq is the 
overhead cost. Thus the optimal choice of the sample sizes is reduced to 
finding the values of tîi, tî 2 , • • • , t^at minimize — /,) 

subject to the constraint 


XL ^ ^0 ’ 


(7.85) 


/ = 1 


where d is constant. Using the method of Lagrange multipliers, we write 


F = E —Ljsf{l -fi) + A E CiHi + Co~d 


i=i «<■ 


= é è ^L]sf+k[ iciHi+c.-d . 

i = l i = l \/ = l 


Differentiating with respect to (i= 1, 2, ... , r), we obtain 


= + Ac, = 0, / = 1, 2, . . . , r. 


âri; 


Thus 




By substituting rii in the equality constraint (7.85) we get 




If^^^CiLiSi 


d Cl 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


315 


Therefore, 




1=12 r 

V A. ^ ^ » » » ^ / » 


(7.86) 


It is easy to verify (using the sufficient conditions in Section 7.8) that the 
values of /î 2 , . . . , /î;- given by équation (7.86) minimize Var(ÿ^t) under the 
constraint of equality (7.85). We conclude that Var(ÿ^j) is minimized when n- 
is proportional to (l/-\/c, )A^5^ (/ = 1, 2, . . . , r). Consequently, must be 
large if the corresponding stratum is large, if the cost of sampling per unit in 
that stratum is low, or if the variability within the stratum is large. 


FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

Bickel, P. J., and K. A. Doksum (1977). Mathematical Statistics, Holden-Day, San 
Francisco. (Chap. 1 discusses distribution theory for transformation of random 
vectors.) 

Brownlee, K. A. (1965). Statistical Theory and Methodology, 2nd ed. Wiley, New York. 
(See Section 9.8 with regard to the Behrens-Fisher test.) 

Cochran, W. G. (1963). Sampling Techniques, 2nd ed. Wiley, New York. (This is a 
classic book on sampling theory as developed for use in sample surveys.) 

Corwin, L. J., and R. H. Szczarba (1982). Multivariate Calculas. Marcel Dekker, New 
York. (This is a useful book that provides an introduction to multivariable 
calculus. The topics covered include continuity, différentiation, multiple inté- 
grais, line and surface intégrais, differential forms, and infinité sériés.) 

Fulks, W. (1978). Advanced Calculus, 3rd ed. Wiley, New York. (Chap. 8 discusses 
limits and continuity for a multivariable function; Chap. 10 covers the inverse 
function theorem; Chap. 11 discusses multiple intégration.) 

Gillespie, R. P. (1954). Partial Différentiation. Oliver and Boyd, Edinburgh, Scotland. 
(This concise book provides a brief introduction to multivariable calculus. It 
covers partial différentiation, Taylor’s theorem, and maxima and minima of 
functions of several variables.) 

Kaplan, W. (1991). Advanced Calculus, 4th ed. Addison-Wesley, Redwood City, 
California. (Topics pertaining to multivariable calculus are treated in several 
chapters including Chaps. 2, 3, 4, 5, and 6.) 

Kaplan, W., and D. J. Lewis (1971). Calculus and Linear Algebra, Vol. IL Wiley, New 
York. (Chap. 12 gives a brief introduction to differential calculus of a multivari- 
able function; Chap. 13 covers multiple intégration.) 

Lindgren, B. W. (1976). Statistical Theory, 3rd ed. Macmillan, New York. (Multivariate 
transformations are discussed in Chap. 10.) 

Lowerre, J. M. (1983). “An intégral of the bivariate normal and an application.” 
Amer. Statist., 37, 235-236. 

Rudin, W. (1964). Principles of Mathematical Analysis, 2nd ed. McGraw-Hill, New 
York. (Chap. 9 includes a study of multivariable functions.) 



316 


MULTIDIMENSIONAL CALCULUS 


Sagan, H. (1974). Advanced Calculas. Houghton Mifflin, Boston. (Chap. 9 covers 
differential calculus of a multivariable fonction; Chap. 10 deals with the inverse 
fonction and implicit fonction theorems; Chap. 11 discosses moltiple intégration.) 

Satterthwaite, F. E. (1946). “An approximate distribotion of estimâtes of variance 
componcnts.’" Biométries Bull., 2, 110-114. 

Taylor, A. E., and W. R. Mann (1972). Advanced Calculus, 2nd ed. Wiley, New York. 
(This book contains several chapters on moltivariable calcolos with many helpfol 
exercises.) 

Thibaodeao, Y., and G. P. H. Styan (1985). “Boonds for Chakrabarti’s measore of 
imbalance in experimental design.” In Proceedings of the First International Tarn- 
pere Seminar on Linear Statistical Models and Their Applications, T. Pokkila and S. 
Pontanen, eds. University of Tampere, Tampere, Finland, pp. 323-347. 

Wen, L. (2001). “A coonterexample for the two-dimensional density fonction.” Amer. 
Math. Monthly, 108, 367-368. 


EXERCISES 


In Mathematics 

7.1. Let f{xi,X 2 ) be a fonction defined on as 



X 2 ^ 0 , 



(a) Show that f{xi,X 2 ) bas a limit equal to zéro as x = (xi,X 2 )' ^0 
along any straight line through the origin. 

(b) Show that f{x^, X 2 ) does not hâve a limit as x ^ 0. 

7.2. Prove Lemma 7.3.1. 

7.3. Prove Lemma 7.3.2. 

7.4. Prove Lemma 7.3.3. 

7.5. Consider the fonction 




„2_|_„2’ ( î ^ 2 ) (b? 0) î 

Xi -r X2 

0, (xi,X2) = (0,0). 


(a) Show that f{x^, X 2 ) is not continooos at the origin. 



EXERCISES 


317 


(b) Show that the partial dérivatives of /(x^, X2) with respect to and 
X2 exist at the origin. 

[Note: This exercise shows that a multivariable function does not hâve 
to be continuons at a point in order for its partial dérivatives to exist at 
that point.] 


7.6. The function /(x^, X2, . . . , x^^.) is said to be homogeneous of degree n in 
X 5 X 2 5 » ». 5 X ^ if for any nonzero scalar t, 


/(/Xi, /X2, . . . , tXk) = t”/(Xi, X2, . . . , x^) 


for ail X = (xp X2, . . . , x^)' in the domain of /. Show that if 
/(xp X2, . . . , x^) is homogeneous of degree n, then 


k 

i = l 





[Note: This resuit is known as Euler’s theorem for homogeneous 
functions.j 


7.7. Consider the function 


f{Xi,X2) 


( x^X 

INN’ (Xi,X2) #(0,0), 

Xl “T X2 

0, (Xi,X2) = (0,0). 


(a) Is / continuons at the origin? Why or why not? 

(b) Show that / has a directional dérivative in every direction at the 
origin. 

7.8. Let 5 be a surface defined by the équation /(x) = Cq, where x = 
(x^, X2, . . . , Xj^y and Cq is a constant. Let C dénoté a curve on S given 
by the équations x^ = giiO, X2 = 82(0, . . . , x^j. = gk( 0 , where 

^re différentiable functions. Let 5 be the arc length of C 
measured from some fixed point in such a way that 5 increases with t. 
The curve can then be parameterized, using ^ instead of t, in the form 
x^ = hys), X2 = h2(s ), . . . , x^ = hf^(s). Suppose that / has partial dériva- 
tives with respect to x^, X2, . . . , x^. 

Show that the directional dérivative of / at a point x on C in the 
direction of v, where v is a unit tangent vector to C at x (in the 
direction of increasing s), is equal to df/ds. 

7.9. Use Taylor’s expansion in a neighborhood of the origin to obtain a 
second-order approximation for each of the following functions: 



318 


MULTIDIMENSIONAL CALCULUS 


(a) /(xi, X2) = exp(x2 sin x^). 

(b) /(x^, X2, X3) = sin(e^^ +x| +X3). 

(c) /(x^, X2) = COs(x^ X2). 

7.10. Suppose that /(x^, X2) and gix^, X2) are continuously différentiable 
functions in a neighborhood of a point Xq = (x^q, X2o)'- Consider the 
équation =/(xi, X2). Suppose that df/ dx^ 0 at Xq. 

(a) Show that 


^^1 





in a neighborhood of Xq. 

(b) Suppose that in a neighborhood of Xq, 


Show that 


^if’ 8 ) 

d{X-^,X 2 ) 



à g dx-^ 

h 

0^X3 âX 2 



that is, g is actually independent of X2 in a neighborhood of Xq. 

(c) Deduce from (b) that there exists a function 4 >: D ^ R, where 
Z) ci^ is a neighborhood of /(xq), such that 

g(Xi,X2) = c/)[/(Xi,X2)] 


throughout a neighborhood of Xq. In this case, the functions / and 
g are said to be functionally dépendent. 

(d) Show that if / and g are functionally dépendent, then 


^if’ 8 ) 

d{X-^,X2) 



[Note: From (b), (c), and (d) we conclude that / and g are functionally 
dépendent on a set A<zR^ if and only if d(f, g)/â(xi, X2) = 0 in A.] 


7.11. Consider the équation 


âu du du 

Xi hX2 hX3 

dX^ dX2 dXo, 


= nu. 



EXERCISES 


319 


Let =Xi/x 3, 12 ^3 =^ 3 - Use this change of variables to show 

that the équation can be written as 



= nu. 


Deduce that u is of the form 


/ -^1 

u=x^F\ — 

U 3 



\ 



7.12. Let and ^2 t^e defined as 


U 


U 


1 =Xi(l -xlŸ^ +X 2(1 -xlŸ^, 

2 = (1 “^1 - X1X2 . 


Show that and are functionally dépendent. 


7.13. Let f: B? be defined as 

U=/(x), X= (Xi,X2,X3)\ U= 

where =xl, U 2 =x\, =x\. 

(a) Show that the Jacobian matrix of f is not nonsingular in any subset 
D <zR^ that contains points on ay of the coordinate planes. 

(b) Show that f has a unique inverse everywhere in including any 
subset D of the type described in (a). 

[Note: This exercise shows that the nonvanishing of the Jacobian 
déterminant in Theorem 7.6.1 (inverse fonction theorem) is a sufficient 
condition for the existence of an inverse fonction, but is not necessary.] 


7.14. Consider the équations 

gi{xi,x2,yi,y2) = 0 , 
g2(xi,x2,yi,y2) = 0 , 

where and g 2 are différentiable functions defined on a set D 
Suppose that d(gi, g 2 )/d(xi, X 2 ) 'm D. Show that 

dx^ d{g^,g2) / d{g^,g2) 

dyi â{y^,X2) I d(Xi,X2)’ 

dX2 à{g^,g2) / ^(gl,g2) 

ô'-Vi ô»(^i>-Vi) / ^{Xi,X2) ■ 



320 


MULTIDIMENSIONAL CALCULUS 


7.15. Let /(xi, % 2 , X 3 ) = 0, X 2 , X 3 ) = 0, where / and g are différentiable 

functions defined on a set D <zR^. Suppose that 


d(X2,X^) ’ â(X2,X^) 

in D. Show that 


dx^ 

d{f,g)/d{X2,X^) 


dx^ 

d{f,g)/â{Xi,X^) 


d{f,g) 

d{x^,X2) 


#0 


dx^ 

'^if,g)/<^(Xi,X2) ■ 


7 . 16 . Détermine the stationary points of the following functions and check 
for local minima and maxima: 

(a) f = x\ +x\ +Xi +%2 +X1X2. 

(b) /= 2aXi — X 1 X 2 +xf +Xi — X 2 + 1, where a is a scalar. Can a be 
chosen so that the stationary point is (i) a point of local minimum; 
(ii) a point of local maximum; (iii) a saddle point? 

(c) f = x\ — 6 X 1 X 2 + 3x| - 24xi + 4. 

(d) f = x{ +X2 — 2 (xi — X2)^. 


7.17. Consider the function 

i+p + Er=iA- 

which is defined on the région 

C= {(Pl,P2>--->Pm)|0 <jP ^ 1, i= 1,2, 

where p is a known constant. Show that 

(a) df/ dp^, for / = 1, 2, . . . , m, vanish at exactly one point in C. 

(b) The gradient vector V/ = {df/ df/ dp2 , . . . , df/ dp^f does not 
vanish anywhere on the boundary of C. 

(c) / attains its absolute maximum in the interior of C at the point 
{p°,p°i,...,p°J, where 

1+p^ 

p° = — — , i = l,2,...,m. 

1+p 

[Note: The function / was considered in an article by Thibaudeau and 
Styan (1985) concerning a measure of imbalance for experimental 
designs.] 

7.18. Show that the function /= (x 2 — Xi)(x 2 — 2xi) does not hâve a local 
maximum or minimum at the origin, although it has a local minimum 
for ^ = 0 along every straight line given by the équations x^ = at, X 2 = ht, 
where a and b are constants. 



EXERCISES 


321 


7.19. Find the optimal values of the function f = xf-\- 12xiX2 + 2x| subject 
to 4x1 +^2 ^ 25. Déterminé the nature of the optima. 

7.20. Find the minimum distance from the origin to the curve of intersection 

of the surfaces, x^ix^ +^2) ^ ^ 1* 

7.21. Apply the method of Lagrange multipliers to show that 

(xfxlxlŸ^^ < j{x^ +xl +xl) 
for ail values of X 2 , X3. 

[Hint: Find the maximum value of f = x\x\xl subject to +x| +x| = 
c^, where c is a constant.] 

7.22. Prove Theorem 7.9.3. 

7.23. Evaluate the following intégrais: 

(a) / IdX 2 '\/xi dxi dx 2 , where 

D = {(xi, X2)|xi > 0, X2 >Xi, X 2 < 2 — Xi}. 

/oH + ^X2)dx^]dX2^ 

7.24. Show that if /(x^, X2) is continuons, then 






7.25. Consider the intégral 



(a) Write an équivalent expression for I by reversing the order of 
intégration. 

(b) If gixi) = //r/^i/(xi, X2 ) At 2, find dg/dx^. 

7.26. Evaluate / { 0 X 1 X 2 dxi dx 2 , where D is a région enclosed by the four 
parabolas x| =x^ x| = 2x^, xJ =X2, x{ = 2x2* 

[Hint: Use a proper change of variables.] 



322 


MULTIDIMENSIONAL CALCULUS 


121, Evaluate / / j^{xl-\-xl)dXidx 2 dx^, where Z) is a sphere of radius 1 
centered at the origin. 

[Hint: Make a change of variables using spherical polar coordinates of 
the form 


=r sin ^cos 
X 2 =r sin 0 sin (/>, 
X3 =rcos 0, 


Q<r<l, Q<0<7r,Q<(j)< 27t.] 


7.28. Find the value of the intégral 



dx 

{l+X^Ÿ 


[Hint: Consider the intégral dx/(a +x^), where a > 0.] 


In Statistics 

7.29. Suppose that the random vector X = X 2 )' has the density function 




X^ +X 2 , 0 <X^ < 1 , 0 <^2 < 1 , 


0 


elsewhere . 


(a) Are the random variables X-^ and X2 independent? 

(b) Find the expected value of X^X 2 - 

7.30. Consider the density function /(x^, X2) of X = (X^, X 2 )', where 


f{Xi,X2) 


1, -X 2 <x^ ^ 

0 elsewhere . 


Show that X^ and X2 are uncorrelated random variables [that is, 
E(XiX 2 ) = E(X^)E(X 2 )], but are not independent. 

7.31. The density function of X = (X^, X 2 )' is given by 




1 

^ Y P ^ P ^1 ^2 n<Ty Y <rco 

T{a)T{l3) 

0 


elsewhere . 



EXERCISES 


323 


where a > 0, )S > 0, and T{m) is the gamma function T{m) = 
dx, m>0. Suppose that and Y 2 are random variables 

defined as 



X1+X2’ 




(a) Find the joint density function of Y^ and ¥2* 

(b) Find the marginal densities of Y^ and Y 2 . 

(c) Are Y^ and Y2 independent? 

7.32. Suppose that X = (X^, X 2 )' has the density function 

^ \ / 10^1^2 > 0 <Xi<X 2 , 0 <X 2 <l, 

^ \o elsewhere. 

Find the density function of W = X^X 2 - 

7.33. Find the density function of fF= (X^ given that X = (X^, X 2 )' 

has the density function 

r/ X 4X1X2 X. > 0 , Xo > 0 , 

f{x„X 2 ) = { ^ " . . 

1 ü elsewhere . 


7.34. Let XpX 2 ,...,X„ be independent random variables that hâve the 
exponential density f(x) = e~^, x > 0. Let Y-^^,Y 2 , . . . ,Y^ be n random 
variables defined as 




y2=Xi+X2, 


y„=Xi+X2+-+x„. 

Find the density of Y = (Y^, ¥ 2 , . . . , Y^)', and then deduce the marginal 
density of 

7.35. Prove formula (7.84). 

7.36. Let X^ and X 2 be independent random variables such that = 
(6/(t^)X^ and W 2 = (S/(T 2 )X 2 hâve the chi-squared distribution with 



324 


MULTIDIMENSIONAL CALCULUS 


six and eight degrees of freedom, respectively, where and are 
unknown parameters. Let 9= An unbiased estimator of 6 

is given by ^ + 9 X 2 , since X^ and X 2 are unbiased estimators of 

and 0 - 2 ^, respectively. 

Using Satterthwaite’s approximation (see Satterthwaite, 1946), it can 
be shown that r]0/9 is approximately distributed as a chi-squared 
variate with 17 degrees of freedom, where 17 is given by 


/I 2,1 2\^ 

_ (7^1 9^2 j 

^ i/'i I 1(1 2 \^ ’ 

0(7 ^1) “^8(9 ^2) 

which can be written as 

8(9 + 7A)^ 

108 + 49 ’ 

where A = It follows that the probability 


p=P 


A 

lO 


A'o.025,77 


< e< 


A 

vo 


A'o.975,77 


? 


where Xa ri dénotés the upper 100 a; % point of the chi-squared distri- 
bution with 7] degrees of freedom, is approximately equal to 0.95. 
Compute the exact value of p using double intégration, given that 
A = 2. Compare the resuit with the 0.95 value. 

[Notes: (1) The density function of a chi-squared random variable with 
n degrees of freedom is given in Example 6.9.6. (2) In general, 77 is 
unknown. It can be estimated by 7) which results from replacing A with 

A A 

A=X 2 /X^ in the formula for 77 . (3) The estimator 9 is used in the 
Behrens-Fisher test statistic for comparing the means of two popula- 
tions with unknown variances o-^ and 0 - 2 ^, which are assumed to be 
unequal. If and 1^2 are the means of two independent samples of 
sizes n^ = l and 1 X 2 = 9, respectively, randomly chosen from these 
populations, then 9 is the variance of — 1 ^ 2 . In this case, X^ and X 2 
represent the corresponding sample variances. The Behrens-Fisher 
t-statistic is then given by 


^ 1-^2 
t 

If the two population means are equal, t has approximately the t-distri- 
bution with 77 degrees of freedom. For more details about the 
Behrens-Fisher test, see for example, Brownlee (1965, Section 9.8).] 




EXERCISES 


325 


7.37. Suppose that a parabola of the form /x = /3 q + is fitted to a 

set of paired data, (x^, y^), (x 2 , y 2 )^ • • • ? 3^n)- Obtain estimâtes of /3 q, 

/3i, and P 2 by minimizing E”=i[y/ — ( )So + P^Xi + /32xf )]^ with respect 
to /3o, /3i, and ^ 2 ^ 

[Note: The estimâtes obtained in this manner are the least-squares 
estimâtes of /3g, and P 2 ^ 


7.38. Suppose that we hâve k disjoint events A^, A 2 , - . . , A/^ such that the 
probability of A^ is (/ = 1,2, . . . , Æ) and Ef=iP/=l- Furthermore, 
suppose that among n independent trials there are X^X 2 ,...,X^ 
outcomes associated with A^, A 2 , . . . , A^, respectively. The joint proba- 
bility that =Xi, X 2 =X 2 , . . . , =x^ is given by the likelihood func- 

tion 


L(x,p) = 



where x^ = 0, 1, 2, . . . , n for / = 1,2,...,Æ such that Ef=iX^ =/r,x = 
(xp X 2 , . . . , X;.)', P = /? 2 ? • • • 5 P/:)'- This defines a joint distribution 

for X 2 , . . . , known as the multinomial distribution. 

Find the maximum likelihood estimâtes of p 2 , . . . , p/. by maximiz- 
ing L(x, p) subject to P/ = 1. 

[Hint: Maximize the natural logarithm of Pi^P 2 ^"*p^^ subject to 


7.39. Let (/>(y) be a positive, even, and continuons function on ( — œ^œ) such 
that c/>(y) is strictly decreasing on (0, 00 ), and fZo,<l>(y)dy = 1. Consider 
the following bivariate density function: 

1 1 +x/</>(y), -</>(y) <x<0, 

f(x,y) = ll-x/(l){y), 0<x<(/)(y), 

[ 0 otherwise . 


(a) Show that /(x, y) is continuons for —^<x,y<^. 

(b) Let F(x, y) be the corresponding cumulative distribution function. 


.X .y 

F(x,y)= / / f(s,t)dsdt. 


— 00*' — 00 


Show that if 0 < Ax < c^(0), then 


f4>-Hàx) fXx 

1 

^0 *^0 

1 — 


ds dt 


> f Ax (j) Ax), 


where (/> ^ is the inverse function of (/>(y) for 0 <y < 



326 


MULTIDIMENSIONAL CALCULUS 


(c) Use part (b) to show that 


lim 


F(Ax,0) -F(0,0) 


= 00 

» 


Hence, dF{x,y)/dx does not exist at (0,0). 
(d) Deduce from part (c) that the equality 


f{x,y) 


d^F{x,y) 
dx dy 


does not hold in this example. 

{Note: This example was given by Wen (2001) to demonstrate that 
continuity of fix,y) is not sufficient for the existence of âF/âx, and 
hence for the validity of the equality in part (d).] 



CHAPTER 8 


Optimization in Statistics 


Optimization is an essential feature in many problems in statistics. This is 
apparent in almost ail fields of statistics. Here are few examples, some of 
which will be discussed in more detail in this chapter. 

1 . In the theory of estimation, an estimator of an unknown parameter is 
sought that satisfies a certain optimality criterion such as minimum 
variance, maximum likelihood, or minimum average risk (as in the case 
of a Bayes estimator). Some of these criteria were already discussed in 
Section 7.11. For example, in régression analysis, estimâtes of the 
parameters of a fitted model are obtained by minimizing a certain 
expression that measures the closeness of the fit of the model. One 
common example of such an expression is the sum of the squared 
residuals (these are déviations of the predicted response values, as 
specified by the model, from the corresponding observed response 
values). This particular expression is used in the method of ordinary 
least squares. A more general class of parameter estimators is the class 
of M-estimators. See Huber (1973, 1981). The name “M-estimator” 
cornes from “generalized maximum likelihood.” They are based on the 
idea of replacing the squared residuals by another symmetric function 
of the residuals that has a unique minimum at zéro. For example, 
minimizing the sum of the absolute values of the residuals produces the 
so-called least absolute values (LAV) estimators. 

2. Estimâtes of the variance components associated with random or mixed 
models are obtained by using several methods. In some of these 
methods, the estimâtes are given as solutions to certain optimization 
problems as in maximum likelihood (ML) estimation and minimum 
norm quadratic unbiased estimation (MINQUE). In the former method, 
the likelihood function is maximized under the assumption of normally 
distributed data [see Hartley and Rao (1967)]. A completely different 
approach is used in the latter method, which was proposed by Rao 
(1970, 1971). This method does not require the normality assumption. 


327 



328 


OPTIMIZATION IN STATISTICS 


For a review of methods of estimating variance components, see Khuri 
and Sahai (1985). 

3 . In statistical inference, tests are constructed so that they are optimal in 
a certain sense. For example, in the Neyman-Pearson lemma (see, for 
example, Roussas, 1973, Chapter 13), a test is obtained by minimizing 
the probability of Type II error while holding the probability of Type I 
error at a certain level. 

4 . In the field of response surface methodology, design settings are chosen 
to minimize the prédiction variance inside a région of interest, or to 
minimize the bias that occurs from fitting the “wrong” model. Other 
optimality criteria can also be considered. For example, under the 
Z)-optimality criterion, the déterminant of the variance -covariance ma- 
trix of the least-squares estimator of the vector of unknown parameters 
(of a fitted model) is minimized with respect to the design settings. 

5. Another objective of response surface methodology is the détermina- 
tion of optimum operating conditions on the input variables that 
produce maximum, or minimum, response values inside a région of 
interest. For example, in a particular Chemical reaction setting, it may 
be of interest to détermine the reaction température and the reaction 
time that maximize the percentage yield of a product. Optimum seeking 
methods in response surface methodology will be discussed in detail in 
Section 8.3. 

6. Several response variables may be observed in an experiment for each 
setting of a group of input variables. Such an experiment is called a 
multiresponse experiment. In this case, optimization involves a number 
of response functions and is therefore referred to as simultaneous (or 
multiresponse) optimization. For example, it may be of interest to 
maximize the yield of a certain Chemical compound while reducing the 
production cost. Multiresponse optimization will be discussed in Sec- 
tion 8.7. 

7. In multivariate analysis, a large number of measurements may be 
available as a resuit of some experiment. For convenience in the 
analysis and interprétation of such data, it would be désirable to work 
with fewer of the measurements, without loss of much information. 
This problem of data réduction is dealt with by choosing certain linear 
functions of the measurements in an optimal manner. Such linear 
functions are called principal components. 

Optimization of a multivariable function was discussed in Chapter 7. 
However, there are situations in which the optimum cannot be obtained 
explicitly by simply following the methods described in Chapter 7. 
Instead, itérative procedures may be needed. In this chapter, we shall 
first discuss some commonly used itérative optimization methods. A 
number of these methods require the explicit évaluation of the partial 
dérivatives of the function to be optimized (objective function). These 



THE GRADIENT METHODS 


329 


are referred to as the gradient methods. Three other optimization 
techniques that rely solely on the values of the objective function will 
also be discussed. They are called direct search methods. 


8.1. THE GRADIENT METHODS 

Let /(x) be a real-valued function of k variables X 2 , . . . , where 
X = (xi, X 2 , . . . , x^)'. The gradient methods are based on approximating /(x) 
with a low-degree polynomial, usually of degree one or two, using Taylor’s 
expansion. The first- and second-order partial dérivatives of /(x) are there- 
fore assumed to exist at every point x in the domain of /. Without loss of 
generality, we shall consider that / is to be minimized. 


8.1.1. The Method of Steepest Descent 

This method is based on a first-order approximation of /(x) with a polyno- 
mial of degree one using Taylor’s theorem (see Section 7.5). Let Xq be an 
initial point in the domain of /(x). Let Xq + thg be a neighboring point, 
where tho represents a small change in the direction of a unit vector Eq (that 
is, t > 0). The corresponding change in /(x) is /(xq + tho ) — /(xq). A first- 
order approximation of this change is given by 

/(xo + ?ho) -/(xq) =fh'oV/(xo), (8.1) 

as can be seen from applying formula (7.27). If the objective is to minimize 
/(x), then hg must be chosen so as to obtain the largest value for — thoV/(xg). 
This is a constrained maximization problem, since h g has unit length. For this 
purpose we use the method of Lagrange multipliers. Let F be the function 

F= -fh'oV/(xo) + A(h'oho- !)• 

By differentiating F with respect to the éléments of hg and equating the 
dérivatives to zéro we obtain 


ho = ^W(xo)- 


( 8 . 2 ) 


Using the constraint hghg = 1, we find that A must satisfy the équation 



^IX/(Xo)l 


(8.3) 


where ||V/(xg)||2 is the Euclidean norm of V/(xg). In order for — thgV/(xQ) 



330 


OPTIMIZATION IN STATISTICS 


to hâve a maximum, A must be négative. From formula (8.3) we then hâve 


A=-^||V/(xo)| 


2- 


By substituting this expression in formula (8.2) we get 

V/(Xq) 

° IIV/(Xo)||2‘ 


(8.4) 


Thus for a given ^ > 0, we can achieve a maximum réduction in /(xq) by 
moving from Xq in the direction specified by hQ in formula (8.4). The value of 
t is now determined by performing a linear search in the direction of hg. This 
is accomplished by increasing the value of t (starting from zéro) until no 
further réduction in the values of / is obtained. Let such a value of t be 
denoted by ^q. The corresponding value of x is given by 

V/(xo) 

^0 \|| * 
l|V/(Xo)||2 


Since the direction of hg is in general not toward the location x* of the 
true minimum of /, the above process must be performed iteratively. Thus if 
at stage i we hâve an approximation x^ for x*, then at stage / + 1 we hâve the 
approximation 


where 


X 


(+1 


= x,- + tyh,-, 


/ = 0,1,2,.. 


? 


IIV/(X,)I|2’ 


i = 0,1,2,.. 


? 


and ti is determined by a linear search in the direction of h,, that is, is the 
value of t that minimizes /(x- + ■). Note that if it is desired to maximize /, 

then for each i (> 0) we need to move in the direction of —h.. In this case, 
the method is called the method of steepest ascent. 

Convergence of the method of steepest descent can be very slow, since 
frequent changes of direction may be necessary. Another reason for slow 
convergence is that the direction of h- at the ith itération may be nearly 
perpendicular to the direction toward the minimum. Furthermore, the method 
becomes inefficient when the first-order approximation of / is no longer 
adéquate. In this case, a second-order approximation should be attempted. 
This will be described in the next section. 



THE GRADIENT METHODS 


331 


8.1.2. The Newton-Raphson Method 

Let Xq be an initial point in the domain of /(x). By a Taylor’s expansion of / 
in a neighborhood of Xq (see Theorem 7.5.1), it is possible to approximate 
/(x) with the quadratic function 4>ix) given by 

Cj){x) =/(xo) + (x-Xo)'V/(xo) + — (x-Xo)'H^(xo)(x-Xo), (8.5) 

where H^Cxq) is the Hessian matrix of / evaluated at Xq. 

On the basis of formula (8.5) we can obtain a reasonable approximation to 
the minimum of /(x) by using the minimum of c^(x). If 4>(x) attains a local 
minimum at x^, then we must necessarily hâve V4>{x^) = 0 (see Section 7.7), 
that is. 


W(xo) +%(xo)(xi -X q) =0. (8.6) 

If H^(Xg) is nonsingular, then from équation (8.6) we obtain 

Xi=Xo-H7i(xo)V/(xo). 

If we now approximate /(x) with another quadratic function, by again 
applying Taylor’s expansion in a neighborhood of x^, and then repeat the 
same process as before with x^ used instead of Xq, we obtain the point 

X2=Xi -H7l(Xi)V/(Xi). 


Further répétitions of this process 
Xo,x^,X 2 , . . . ,x^, . . . , such that 


X ;+1 =X;-H.i(x,.)V/(x;), j = 0, 1,2,... . 


7 


(8.7) 


The Newton-Raphson method requires finding the inverse of the Hessian 
matrix at each itération. This can be computationally involved, especially 
if the number of variables, k, is large. Furthermore, the method may fail to 
converge if Hy(x ■) is not positive definite. This can occur, for example, when 
Xj is far from the location x* of the true minimum. If, however, the initial 
point Xq is close to x*, then convergence occurs at a rapid rate. 


8.1.3. The Davidon-Fletcher-Powell Method 

This method is basically similar to the one in Section 8.1.1 except that at the 
/th itération we hâve 


/ = 0 , 1 , 2 , 


? 


Xi+i =x, - 0iG,V/(x,.), 


» » » 



332 


OPTIMIZATION IN STATISTICS 


where G, is a positive definite matrix that serves as the ith approximation to 
the inverse of the Hessian matrix H|(x,), and is a scalar determined by a 
linear search from in the direction of — G,V/(x^), similar to the one for the 
steepest descent method. The initial choice Gg of the matrix G can be any 
positive definite matrix, but is usually taken to be the identity matrix. At the 
(i + l)st itération, G, is updated by using the formula 

G,^i=G, + L, + M,, / = 0,1,2,..., 


where 


G,[V/(x,>i) - V/(x,)][V/(x,,,) -V/(x,)]'G, 
[V/(x,,0 - V/(x,)]'G,[V/(x,,i) - V/(x,)] ’ 
e,[G,V/(x,)][G,V/(x,)]' 
[G,V/(x,)]'[V/(x,,i)-V/(x,)] ■ 


The justification for this method is given in Fletcher and Powell (1963). 
See also Bunday (1984, Section 4.3). Note that if G^ is initially chosen as the 
identity, then the first incrément is in the steepest descent direction — V/(xg). 

This is a powerful optimization method and is considered to be very 
efficient for most functions. 


8.2. THE DIRECT SEARCH METHODS 

The direct search methods do not require the évaluation of any partial 
dérivatives of the objective function. For this reason they are suited for 
situations in which it is analytically difficult to provide expressions for the 
partial dérivatives, such as the minimization of the maximum absolute dévia- 
tion. Three such methods will be discussed here, namely, the Nelder-Mead 
simplex method, Price’s controlled random search procedure, and general- 
ized simulated annealing. 


8.2.1. The Nelder-Mead Simplex Method 

Let /(x), where x = (x^, %2, . . . , x^.)', be the function to be minimized. The 
simplex method is based on a comparison of the values of / at the Æ + 1 
vertices of a general simplex followed by a move away from the vertex with 
the highest function value. By définition, a general simplex is a géométrie 
figure formed by a set of Æ + 1 points called vertices in a Æ-dimensional 
space. Originally, the simplex method was proposed by Spendley, Hext, and 
Himsworth (1962), who considered a regular simplex, that is, a simplex with 
mutually équidistant points such as an équilatéral triangle in a two-dimen- 
sional space (k = 2). Nelder and Mead (1965) modified this method by 



THE DIRECT SEARCH METHODS 


333 


allowing the simplex to be nonregular. This modified version of the simplex 
method will be described here. 

The simplex method follows a sequential search procedure. As was men- 
tioned earlier, it begins by evaluating / at the Æ + 1 points that form a 
general simplex. Let these points be denoted by x^X 2 , . . . Let and 

fl dénoté, respectively, the largest and the smallest of the values 
/(x^),/(x 2 ), . . . ,/(x^+i). Let us also dénoté the points where fp^ and fi are 
attained by Xp^ and x^, respectively. 

Obviously, if we are interested in minimizing /, then a move away from Xf^ 
will be in order. Let us therefore define x^ as the centroid of ail the points 
with the exclusion of X;^. Thus 


1 

^c=V Ex,-. 

In order to move away from x^^, we reflect X;^ with respect to x^ to obtain the 
point X*. More specifically, the latter point is defined by the relation 

X* -x,=r(x^-X;.), 


or equivalently, 


x* = (l+r)x^-rx;,, 


where r is a positive constant called the reflection coefficient and is given by 


r = 


\xt - X, 




The points Xp^, x^, and x^ are depicted in Figure 8.1. Let us consider the 



Figure 8.1. A two-dimensional simplex with the reflection (x^), expansion (x^^), and contraction 
(x^^) points. 



334 


OPTIMIZATION IN STATISTICS 


following cases: 

a. If fl <f{xl) <fi^, replace by and start the process again with the 
new simplex (that is, evaluate / at the vertices of the simplex which has 
the same points as the original simplex, but with x* substituted for X;^). 

b. If /(x^) <fi, then the move from x^ to x^ is in the right direction and 
should therefore be expanded. In this case, x^ is expanded to x^^ 
defined by the relation 


xL-x, = r(x^ -X,), 


that is. 


xL = yx^ + (l-y)x^, 


where y ( > I) is an expansion coefficient given by 


y = 


xL~xJ 


(see Figure 8.1). This operation is called expansion. If /(x|g) <f, 
replace Xi^ by x^^ and restart the process. However, if f(xlf) >f, then 
expansion is counterproductive. In this case, x^^ is dropped, Xf^ is 
replaced by x^, and the process is restarted. 

c. If upon reflecting Xf^ to x^ we discover that /(x^) >/(x^) for ail i #/z, 
then replacing Xf^ by x^ would leave /(x^) as the maximum in the new 
simplex. In this case, a new X;^ is defined to be either the old Xf^ or x*, 
whichever has the lower value. A point x^^ is then found such that 


that is. 




î 


xL = ^X;, + (1-/3)x^, 


where /3 (0 < ^8 < I) is a contraction coefficient given by 



xL-x, 


Xft-x, 


I2 


Next, x^^ is substituted for Xf^ and the process is restarted unless 
/(x^^) > min[/;^,/(xp], that is, the contracted point is worse than the 
better of and /(xp. When such a contraction fails, the size of the 
simplex is reduced by halving the distance of each point of the simplex 
from Xf, where, if we recall, X; is the point generating the lowest 
fonction value. Thus x^ is replaced by x^ + |(x, — X;), that is, by |(x^ + 
X;). The process is then restarted with the new reduced simplex. 



THE DIRECT SEARCH METHODS 


335 


Step 1 



Restart No 

I 


Step 1. Select initial points, Xj,X 2 , . . and calculate /(x,), / = 1, 2, . . , , A: + 1. 

Déterminé X/, X;, and calculate x^. = Select r > 0, say r = |, or 1, fînd 

X* = (1 + r)x^ - rXf,, and calculate /(xJ), 

Step 2. (a) Calculate x*^ = -yx* + (1 — y)x^ by choosing y > 1, say y = 1.5, then calculate 
/(x.%X 

(b) Replace x^, with x*. 

Step 3. Replace with x*^. 

Step 4. Calculate xj^ = /3x^ + (1 - /3)x^ by choosing 0 < /3 < 1, say /3 = 0.5, then calculate 

Step 5. Replace ail the x,-’s with (x^ +x^)/2. 

Step 6. Replace Xf^ with x*^. 

Figure 8.2. Flow diagram for the Nelder-Mead simplex method. Source: Nelder and Mead 
(1965). Reproduced with permission of Oxford University Press. 


Thus at each stage in the minimization process, x^, the point at 
which / has the highest value, is replaced by a new point according to 
one of three operations, namely, reflection, contraction, and expansion. 
As an aid to illustrating this step-by-step procedure, a flow diagram is 
shown in Figure 8.2. This flow diagram is similar to one given by Nelder 
and Mead (1965, page 309). Figure 8.2 lists the explanations of steps 1 
through 6. 

The criterion used to stop the search procedure is based on the 
variation in the function values over the simplex. At each step, the 

















336 


OPTIMIZATION IN STATISTICS 


Standard error of these values in the form 




is calculated and compared with some preselected value d, where 
/i, / 2 , . . . , /yt+i dénoté the function values at the vertices of the simplex 
at hand and /= £f=////(^+ !)• The search is halted when s <d. The 
reasoning behind this criterion is that when s <d, ail function values 
are very close together. This hopefully indicates that the points of the 
simplex are near the minimum. 

Bunday (1984) provided the listing of a computer program which can be 
used to implement the steps described in the flow diagram. 

Olsson and Nelson (1975) demonstrated the usefulness of this method by 
using it to solve six minimization problems in statistics. The robustness of the 
method itself and its advantages relative to other minimization techniques 
were reported in Nelson (1973). 


8.2.2. Price’s Controlled Random Search Procedure 

The controlled random search procedure was introduced by Price (1977). It is 
capable of finding the absolute (or global) minimum of a function within a 
constrained région R. It is therefore well suited for a multimodal function, 
that is, a function that has several local minima within the région R. 

The essential features of Price’s algorithm are outlined in the flow diagram 
of Figure 8.3. A predetermined number, A, of trial points are randomly 
chosen inside the région R. The value of N must be greater than Æ, the 
number of variables. The corresponding function values are obtained and 
stored in an array A along with the coordinates of the N chosen points. At 
each itération, k+l distinct points, x^,X 2 ,...,x^+i, are chosen at random 
from the N points in storage. These Æ + 1 points form a simplex in a 
Æ-dimensional space. The point x^+^ is arbitrarily taken as the pôle (desig- 
nated vertex) of the simplex, and the next trial point x^ is obtained as the 
image (reflection) point of the pôle with respect to the centroid x^ of the 
remaining k points. Thus 


X, = ZX,-X^^l. 

The point x^ must satisfy the constraints of the région R. The value of the 
function / at x^ is then compared with the largest function value in 
storage. Let x^^^ dénoté the point at which is achieved. If /(x^) 
then Xj^^ is replaced in the array A by x^. If x^ fails to satisfy the constraints 
of the région R, or if /(x^)>/j^ax’ then x^ is discar ded and a new point is 



THE DIRECT SEARCH METHODS 


337 



Figure 8.3. A flow diagram for Price’s procedure. Source: Price (1977). Reproduced with 
permission of Oxford University Press. 

















338 


OPTIMIZATION IN STATISTICS 


chosen by following the same procedure as the one used to obtain x^. As the 
algorithm proceeds, the N points in storage tend to cluster around points at 
which the function values are lower than the current value of . Price did 
not specify a particular stopping rule. He left it to the user to do so. A 
possible stopping criterion is to terminate the search when the N points in 
storage cluster in a small région of the Æ-dimensional space, that is, when 
and are close together, where is the smallest function value in 
storage. Another possibility is to stop after a specified number of function 
évaluations hâve been made. In any case, the rate of convergence of the 
procedure dépends on the value of A, the complexity of the function /, the 
nature of the constraints, and the way in which the set of trial points is 
chosen. 

Price’s procedure is simple and does not necessarily require a large value 
of N. It is sufficient that N should increase linearly with k. Price chose, for 
example, the value A = 50 for k = 2. The value A = 10k has proved useful 
for many functions. Furthermore, the région constraints can be quite com- 
plex. A FORTRAN program for the implémentation of Price’s algorithm was 
written by Conlon (1991). 


8.2.3. The Generalized Simulated Annealing Method 

This method dérivés its name from the annealing of metals, in which many 
final crystalline configurations (corresponding to different energy States) are 
possible, depending on the rate of cooling (see Kirkpartrick, Gelatt, and 
Vechhi, 1983). The method can be applied to find the absolute (or global) 
optimum of a multimodal function / within a constrained région R in a 
Æ-dimensional space. 

Bohachevsky, Johnson, and Stein (1986) presented a generalization of the 
method of simulated annealing for function optimization. The following is a 
description of their algorithm for function minimization (a similar one can be 
used for function maximization): Let be some tentative estimate of the 
minimum of / over the région R. The method proceeds according to the 
following steps (reproduced with permission of the American Statistical 
Association): 


1. Select an initial point Xq in R. This point can be chosen at random or 
specified depending on available information. 

2. Calculate /q =/(xq). If l/o~/ml < where 6 is a specified small 
constant, then stop. 

3. Choose a random direction of search by generating k independent 

standard normal variâtes z^, Z 2 ? • • • ? 2 :^.; then compute the éléments of 
the random vector u = U2, . . . , Uj^)\ where 

^ ^ T. 1/2 ’ I — 1, 2, . . . , Æ, 

{zf + zi+-- +zl) 

and k is the number of variables in the function. 


» » » 



OPTIMIZATION TECHNIQUES IN RESPONSE SURFACE METHODOLOGY 


339 


4. Set x-^ = Xq-\- Ar u, where Ar is the size of a step to be taken in the 
direction of u. The magnitude of Ar dépends on the properties of the 
objective function and on the desired accuracy. 

5. If does not belong to R, return to step 3. Otherwise, compute 
fl =/(xi) and A/=/i -/q. 

6. if /i </o, set Xg = Xj and /g =f^. If |/g -fj < e, stop. Otherwise, go to 
Step 3. 

7. If >/o, set a probability value given hy p = exp( — /3/^ A/), where /3 
is a positive number such that 0.50 < exp( — )SA/) < 0.90, and g is an 
arbitrary négative number. Then, generate a random number v from 
the uniform distribution Z7(0, 1). If v>p, go to step 3. Otherwise, if 
v<p, set Xq = x^, /o =/i, and go to step 3. 

From steps 6 and 7 we note that bénéficiai steps (that is, </o) ^re 
accepted unconditionally, but detrimental steps (/^ >/o) are accepted accord- 
ing to a probability value p described in step 7. If v < p, then the step leading 
to x^ is accepted; otherwise, it is rejected and a step in a new random 
direction is attempted. Thus the probability of accepting an incrément of / 
dépends on the size of the incrément: the larger the incrément, the smaller 
the probability of its acceptance. 

Several possible values of the tentative estimate can be attempted. For 
a given we proceed with the search until /— /^ becomes négative. Then, 
we decrease /^, continue the search, and repeat the process when necessary. 
Bohachevsky, Johnson, and Stein gave an example in optimal design theory 
to illustrate the application of their algorithm. 

Price’s (1977) controlled random search algorithm produces results compa- 
rable to those of simulated annealing, but with fewer tuning parameters. It is 
also better suited for problems with constrained régions. 


8.3. OPTIMIZATION TECHNIQUES IN RESPONSE 
SURFACE METHODOLOGY 

Response surface methodology (RSM) is an area in the design and analysis of 
experiments. It consists of a collection of techniques that encompasses: 

1. Conducting a sériés of experiments based on properly chosen settings 
of a set of input variables, denoted by X 2 , . . . , that influence a 
response of interest y. The choice of these settings is governed by 
certain criteria whose purpose is to produce adéquate and reliable 
information about the response. The collection of ail such settings 
constitutes a matrix D of order n Xk, where n is the number of 
experimental runs. The matrix D is referred to as a response surface 
design. 



340 


OPTIMIZATION IN STATISTICS 


2. Determining a mathematical model that best fits the data collected 
under the design chosen in (1). Régression techniques can be used to 
evaluate the adequacy of fit of the model and to conduct appropriate 
tests concerning the modehs parameters. 

3. Determining optimal operating conditions on the input variables that 
produce maximum (or minimum) response value within a région of 
interest R. 

This last aspect of RSM can help the expérimenter in determining the best 
combinations of the input variables that lead to désirable response values. 
For example, in drug manufacturing, two drugs are tested with regard to 
reducing blood pressure in humans. A sériés of clinical trials involving a 
certain number of high blood pressure patients is set up, and each patient 
is given some predetermined combination of the two drugs. After a period of 
time the patient’s blood pressure is checked. This information can be used to 
find the spécifie combination of the drugs that results in the greatest 
réduction in the patient’s blood pressure within some specified time interval. 

In this section we shall describe two well-known optimum-seeking proce- 
dures in RSM. These include the method of steepest ascent (or descent) and 
ridge analysis. 


8.3.1. The Method of Steepest Ascent 

This is an adaptation of the method described in Section 8.1.1 to a response 
surface environment; here the objective is to increase the value of a certain 
response function. 

The method of steepest ascent requires performing a sequence of sets of 
trials. Each set is obtained as a resuit of proceeding sequentially along a path 
of maximum increase in the values of a given response y, which can be 
observed in an experiment. This method was first introduced by Box and 
Wilson (1951) for the general area of RSM. 

The procedure of steepest ascent dépends on approximating a response 
surface with a hyperplane in some restricted région. The hyperplane is 
represented by a first-order model which can be fitted to a data set obtained 
as a resuit of running experimental trials using a first-order design such as a 
complété 2^ factorial design, where k is the number of input variables in the 
model. A fraction of this design can also be used if k is large [see, for 
example. Section 3.3.2 in Khuri and Cornell (1996)]. The fitted first-order 
model is then used to détermine a path along which one may initially observe 
increasing response values. However, due to curvature in the response 
surface, the initial increase in the response will likely be followed by a 
leveling off, and then a decrease. At this stage, a new sériés of experiments is 
performed (using again a first-order design) and the resulting data are used 
to fit another first-order model. A new path is determined along which 



OPTIMIZATION TECHNIQUES IN RESPONSE SURFACE METHODOLOGY 


341 


increasing response values may be observed. This process continues until it 
becomes évident that little or no additional increase in the response can be 
gained. 

Let us now consider more spécifie details of this sequential procedure. Let 
y(x) be a response function that dépends on k input variables, X 2 , . . . , 
which form the éléments of a vector x. Suppose that in some restricted région 
y(x) is adequately represented by a first-order model of the form 

k 

-V(x) =/3o+ E e, (8.8) 

i = l 

where )8o, /3i, . . . , are unknown parameters and c is a random error. This 
model is fitted using data collected under a first-order design (for example, a 
2^ factorial design or a fraction thereof). The data are utilized to calculate 
the least-squares estimâtes /3 q, . . . , of the model’s parameters. These 

are éléments of P = (X'X)“^X'y, where X = [1„: D] with being a vector of 
ones of order n X 1, D is the design matrix of order n Xk, and y is the 
corresponding vector of response values. It is assumed that the random 
errors associated with the n response values are independently distributed 
with means equal to zéro and a common variance The predicted 
response y(x) is then given by 

k 

.V(x) = /3o + E (8.9) 

i = l 


The input variables are coded so that the design center coincides with the 
origin of the coordinates System. 

The next step is to move a distance of r units away from the design center 
(or the origin) such that a maximum increase in y can be obtained. To 
détermine the direction to be followed to achieve such an increase, we need 
to maximize y(x) subject to the constraint =r^ using the method of 

Lagrange multipliers. Consider therefore the function 


k 

ô(x) = ^0 + E - A 



( 8 . 10 ) 


where A is a Lagrange multiplier. Setting the partial dérivatives of Q equal to 
zéro produces the équations 






For a maximum, A must be positive. Using the equality constraint, we 
conclude that 



1/2 


/ 



342 


OPTIMIZATION IN STATISTICS 


A local maximum is then achieved at the point whose coordinates are given 
by 





which can be written as 


= ( 8 . 11 ) 

where / = 1, 2, . . . , Æ. Thus e = (e^, ^ 2 , . . . , is a unit 

AA A 

vector in the direction of ( P 2 , . . . , Equations (8.11) indicate that at a 

distance of r units away from the origin, a maximum increase in y occurs 
along a path in the direction of e. Since this is the only local maximum on the 
hypersphere of radius r, it must be the absolute maximum. 

If the actual response value (that is, the value of y) at the point x = re 
exceeds its value at the origin, then a move along the path determined by e is 
in order. A sériés of experiments is then conducted to obtain response values 
at several points along the path until no additional increase in the response is 
évident. At this stage, a new first-order model is fitted using data collected 
under a first-order design centered at a point in the vicinity of the point at 
which that first drop in the response was observed along the path. This model 
leads to a new direction similar to the one given by formula (8.11). As before, 
a sériés of experiments are conducted along the new path until no further 
increase in the value of y can be observed. The process of moving along 
different paths continues until it becomes évident that little or no additional 
increase in y can be gained. This usually occurs when the first-order model 
becomes inadéquate as the method progresses, due to curvature in the 
response surface. It is therefore necessary to test each fitted first-order 
model for lack of fit at every stage of the process. This can be accomplished 
by taking repeated observations at the center of each first-order design and 
at possibly some other design points in order to obtain an independent 
estimate of the error variance that is needed for the lack of fit test [see, for 
example. Sections 2.6 and 3.4 in Khuri and Cornell (1996)]. If the lack of fit 
test is significant, indicating an inadéquate model, then the process is 
stopped and a more elaborate experiment must be conducted to fit a 
higher-order model, as will be seen in the next section. 

Examples that illustrate the application of the method of steepest ascent 
can be found in Box and Wilson (1951), Bayne and Rubin (1986, Section 5.2), 
Khuri and Cornell (1996, Chapter 5), and Myers and Khuri (1979). In the last 
reference, the authors présent a stopping rule along a path that takes into 
account random error variation in the observed response. We recall that a 
search along a path is discontinued as soon as a drop in the response is first 



OPTIMIZATION TECHNIQUES IN RESPONSE SURFACE METHODOLOGY 


343 


observed. Since response values are subject to random error, the decision to 
stop can be prématuré due to a false drop in the observed response. The 
stopping rule by Myers and Khuri (1979) protects against taking too many 
observations along a path when in fact the true mean response (that is, the 
mean of y) is decreasing. It also protects against stopping prematurely when 
the true mean response is increasing. 

It should be noted that the procedure of steepest ascent is not invariant 
with respect to the scales of the input variables X 2 , . . . , This is évident 
from the fact that a path taken by the procedure is determined by the 
least-squares estimâtes Pk équations (8.11)], which dépend on 

the scales of the x/s. 

There are situations in which it is of interest to détermine conditions that 
lead to a decrease in the response, instead of an increase. For example, in a 
Chemical investigation it may be desired to decrease the level of impurity or 
the unit cost. In this case, a path of steepest descent will be needed. This can 
be accomplished by changing the sign of the response y, followed by an 
application of the method of steepest ascent. Thus any steepest descent 
problem can be handled by the method of steepest ascent. 


8.3.2. The Method of Ridge Analysis 

The method of steepest ascent is most often used as a maximum-region-seek- 
ing procedure. By this we mean that it is used as a preliminary tool to get 
quickly to the région where the maximum of the mean response is located. 
Since the first-order approximation of the mean response will eventually 
break down, a better estimate of the maximum can be obtained by fitting a 
second-order model in the région of the maximum. The method of ridge 
analysis, which was introduced by Hoerl (1959) and formalized by Draper 
(1963), is used for this purpose. 

Let us suppose that inside a région of interest R, the true mean response 
is adequately represented by the second-order model 


k k-1 k k 

J'Cx) = /3o + E + E E + E 

/ = 1 i = \ j = 2 i = l 


( 8 . 12 ) 


i<j 


where the /3’s are unknown parameters and c is a random error with mean 
zéro and variance o-^. Model (8.12) can be written as 


y(x) = iSo + + x'Bx + 6, 


(8.13) 



344 


OPTIMIZATION IN STATISTICS 


where P = ( /32, . . . , and B is a symmetric kXk matrix of the form 


/3ii 

2 Pu 

iPu 2 plk 


Pl2 

2P23 "* 2 p 2 k 

» » 

» » 

• 



. 

» 

• . hPk-\,k 

» 

symmetric 


Pkk 


Least-squares estimâtes of the parameters in model (8.13) can be obtained by 
using data collected according to a second-order design. A description of 
potential second-order designs can be found in Khuri and Cornell (1996, 
Chapter 4). 

A A A 

Let /3 q, P, and B dénoté the least-squares estimâtes of /3 q, P, and B, 
respectively. The predicted response y(x) inside the région R is then given by 


y(x) = /§o + x'P + x'Bx. 


(8.14) 


The input variables are coded so that the design center coincides with the 
origin of the coordinates System. 

The method of ridge analysis is used to find the optimum (maximum or 
minimum) of y(x) on concentric hyperspheres of varying radii inside the 
région R. It is particularly useful in situations in which the unconstrained 
optimum of y(x) falls outside the région R, or if a saddle point occurs 
inside R. 

Let us now proceed to optimize y(x) subject to the constraint 

k 

Y,xf=r^, (8.15) 

i = l 

where r is the radius of a hypersphere centered at the origin and is contained 
inside the région R. Using the method of Lagrange multipliers, let us 
consider the function 


F=y{x) 



(8.16) 


where À is a Lagrange multiplier. Differentiating F with respect to x, 



OPTIMIZATION TECHNIQUES IN RESPONSE SURFACE METHODOLOGY 


345 


(i= 1, 2, , k) and equating the partial dérivatives to zéro, we obtain 


âF 

dX^ 

âF 

âX2 


2 ( Pu - A)xi + P12X2 + ••• = 0 ^ 

Pl2^i + 2 ^ P22 ~ Plk^k Pi 


- — = Pik^i + Pik^2 + ■** Pi^f^ - Ajx^ + /3^ = 0. 
âXk 

These équations can be expressed as 

(B-AI,)x= -ip. (8.17) 

Equations (8.15) and (8.17) need to be solved for and A. This 

traditional approach, however, requires calculations that are somewhat in- 
volved. Draper (1963) proposed the following simpler, yet équivalent proce- 
dure: 

i. Regard r as a variable, but fix A instead. 

ii. Insert the selected value of A in équation (8.17) and solve for x. The 
solution is used in steps iii and iv. 

iii. Compute r = (x'x)^/^. 

iv. Evaluate y(x). 


Several values of A can give rise to several stationary points which lie on 
the same hypersphere of radius r. This can be seen from the fact that if A is 
chosen to be different from any eigenvalue of B, then équation (8.17) has a 
unique solution given by 


x= -i(B-AI,)"'p. (8.18) 

By substituting x in équation (8.15) we obtain 

p'(B-AI,)"'p = 4r2. (8.19) 

Hence, each value of r gives rise to at most 2k corresponding values of A. 

The choice of A has an effect on the nature of the stationary point. Some 
values of A produce points at each of which y has a maximum. Other values 
of A cause y to hâve minimum values. More specifically, suppose that A^ and 



346 


OPTIMIZATION IN STATISTICS 


A 2 are two values substituted for A in équation (8.18). Let x^,X 2 and r 2 be 
the corresponding values of x and r, respectively. The following results, 
which were established in Draper (1963), can be helpful in selecting the value 
of A that produces a particular type of stationary point: 

Result 1. If =V 2 and A^ > A 2 , then > 3 ^ 2 ^ where and 3)2 are the 
values of y(x) at x^ and X 2 , respectively. 

This resuit means that for two stationary points that hâve the same 
distance from the origin, y will be larger at the stationary point with the 
larger value of A. 

Result 2. Let M be the matrix of second-order partial dérivatives of F 
in formula (8.16), that is. 


M = 2(B- AI^). 


( 8 . 20 ) 


If = /* 2 , and if M is positive definite for x^ and is indefinite (that is, neither 
positive definite nor négative definite) for X 2 , then y^ <y 2 * 

A 

Result 3. If A^ is larger than the largest eigenvalue of B, then the 
corresponding solution x^ in formula (8.18) is a point of absolute maximum 
for y on a hypersphere of radius = (x[x^y^^. If, on the other hand, A^ is 

A 

smaller than the smallest eigenvalue of B, then x^ is a point of absolute 
minimum for y on the same hypersphere. 

On the basis of Resuit 3 we can select several values of A that exceed the 

A 

largest eigenvalue of B. The resulting values of the k éléments of x and y can 
be plotted against the corresponding values of r. This produces A + 1 plots 
called ridge plots (see Myers, 1976, Section 5.3). They are useful in that an 
expérimenter can déterminé, for a particular r, the maximum of y within a 
région R and the operating conditions (that is, the éléments of x) that give 
rise to the maximum. Similar plots can be obtained for the minimum of y 
(here, values of A that are smaller than the smallest eigenvalue of B must be 
chosen). Obviously, the portions of the ridge plots that fall outside R should 
not be considered. 

Example 8.3.1. An experiment was conducted to investigate the effects 
of three fertilizer ingrédients on the yield of snap beans under field condi- 
tions. The fertilizer ingrédients and actual amounts applied were nitrogen 
(N), from 0.94 to 6.29 Ib/plot; phosphoric acid (P 2 O 5 ), from 0.59 to 2.97 
Ib/plot; and potash (K 2 O), from 0.60 to 4.22 Ib/plot. The response of 
interest is the average yield in pounds per plot of snap beans. 



OPTIMIZATION TECHNIQUES IN RESPONSE SURFACE METHODOLOGY 


347 


Five levels of each fertilizer were used. The levels are coded using the 
following linear transformations: 



Xi-3.62 

L59 



X2-I.78 

0.71 



X3-2.42 

1.07 


Here, X 2 , and X 3 dénoté the actual levels of nitrogen, phosphoric acid, 
and potash, respectively, used in the experiment, and x^,X 2 ,x^ the corre- 
sponding coded values. In this particular coding scheme, 3.62, 1.78, and 2.42 
are the averages of the experimental levels of X 2 , and X 3 , respectively, 
that is, they represent the centers of the values of nitrogen, phosphoric acid, 
and potash, respectively. The denominators of X 2 , and X 3 were chosen so 
that the second and fourth levels of each X- correspond to the values — 1 
and 1, respectively, for x^ (/ = 1,2,3). One advantage of such a coding 
scheme is to make the levels of the three fertilizers scale free (this is 
necessary in general, since the input variables can hâve different units of 
measurement). The measured and coded levels for the three fertilizers are 
shown below: 


Levels of x^ (i = 1, 2, 3) 


Fertilizer 

- 1.682 

- 1.000 

0.000 

1.000 

1.682 

N 

0.94 

2.03 

3.62 

5.21 

6.29 

P2O5 

0.59 

1.07 

1.78 

2.49 

2.97 

K,0 

0.60 

1.35 

2.42 

3.49 

4.22 


Combinations of the levels of the three fertilizers were applied according 
to the experimental design shown in Table 8.1, in which the design settings 
are given in terms of the coded levels. Six center-point réplications were run 
in order to obtain an estimate of the experimental error variance. This 
particular design is called a central composite design [for a description of this 
design and its properties, see Khuri and Cornell (1996, Section 4.5.3)], which 
has the rotatability property. By this we mean that the prédiction variance, 
that is, Var[ j)(x)], is constant at ail points that are équidistant from the design 
center [see Khuri and Cornell (1996, Section 2.8.3) for more detailed infor- 
mation concerning rotatability]. The corresponding response (yield) values 
are given in Table 8.1. 

A second-order model of the form given by formula (8.12) was fitted to the 
data set in Table 8.1. Thus in terms of the coded variables we hâve the model 

3 3 

-V(x) = /3 q + E + /3i2^i^2 + /3i3^i^3 + 1 ^ 23 X 2 X 3 + E l^iiXf + e- (8.21) 

i=l i=l 



348 


OPTIMIZATION IN STATISTICS 


Table 8.1. The Coded and Actual Settings of the Three Fertilizers 
and the Corresponding Response Values 


Xi 

-^2 

-^3 

N 

P2O5 

K20 

Yield y 

- 1 

-1 

-1 

2.03 

1.07 

1.35 

11.28 

1 

-1 

-1 

5.21 

1.07 

1.35 

8.44 

-1 

1 

-1 

2.03 

2.49 

1.35 

13.19 

1 

1 

-1 

5.21 

2.49 

1.35 

7.71 

-1 

-1 

1 

2.03 

1.07 

3.49 

8.94 

1 

-1 

1 

5.21 

1.07 

3.49 

10.90 

-1 

1 

1 

2.03 

2.49 

3.49 

11.85 

1 

1 

1 

5.21 

2.49 

3.49 

11.03 

- 1.682 

0 

0 

0.94 

1.78 

2.42 

8.26 

1.682 

0 

0 

6.29 

1.78 

2.42 

7.87 

0 

- 1.682 

0 

3.62 

0.59 

2.42 

12.08 

0 

1.682 

0 

3.62 

2.97 

2.42 

11.06 

0 

0 

- 1.682 

3.62 

1.78 

0.60 

7.98 

0 

0 

1.682 

3.62 

1.78 

4.22 

10.43 

0 

0 

0 

3.62 

1.78 

2.42 

10.14 

0 

0 

0 

3.62 

1.78 

2.42 

10.22 

0 

0 

0 

3.62 

1.78 

2.42 

10.53 

0 

0 

0 

3.62 

1.78 

2.42 

9.50 

0 

0 

0 

3.62 

1.78 

2.42 

11.53 

0 

0 

0 

3.62 

1.78 

2.42 

11.02 


Source: A. I. Khuri and J. A. Cornell (1996). Reproduced with permission of Marcel Dekker, Inc. 


The resulting prédiction équation is given by 


y(x) = 10.462 — 0.574xi + 0.183x2 + 0.456x3 - 0.678xiX2 + 1.183xiX3 


+ 0 . 233 X 2 X 3 - 0.676x^ + 0.563xf — 0.273x|. 


( 8 . 22 ) 


Here, y(x) is the predicted yield at the point x = (x^, X 2 , X 3 )'. Equation (8.22) 
can be expressed in matrix form as in équation (8.14), where (3 = 
( — 0.574,0.183,0.456)' and B is the matrix 



-0.676 

-0.339 

0.592 


-0.339 0.592 

0.563 0.117 . 

0.117 -0.273 


(8.23) 





OPTIMIZATION TECHNIQUES IN RESPONSE SURFACE METHODOLOGY 


349 


The coordinates of the stationary point Xq of y(x) satisfy the équation 

— = -0.574 + 2( -0.676xi - 0.339x2 + 0.592x3) = 0, 

Sx 

— = 0.183 + 2( -0.339xi + 0.563x2 + 0. 117X3) = 0^ 
d%2 

— = 0.456 + 2(0.592xi + 0.117x2 - 0.273X3) = 0, 
dx^ 


which can be expressed as 


p + 2Bxo = 0. (8.24) 

Hence, 


Xq= = (-0.394,-0.364,-0.175)'. 


The eigenvalues of B are = 0.6508, T 2 = 0.1298, T 3 = — 1.1678. The matrix 

A 

B is therefore neither positive definite nor négative definite, that is, Xq is a 
saddle point (see Corollary 7.7.1). This point falls inside the experimental 
région R, which, in the space of the coded variables x^,X 2 ,X 3 , is a sphere 
centered at the origin of radius v^. 

Let us now apply the method of ridge analysis to maximize y inside the 
région R. For this purpose we choose values of A [the Lagrange multiplier in 

A 

équation (8.16)] larger than = 0.6508, the largest eigenvalue of B. For each 
such value of A, équation (8.17) has a solution for x that represents a point of 
absolute maximum of y(x) on a sphere of radius r= (x'x)^/^ inside R. The 
results are displayed in Table 8.2. We note that at the point 
( — 0.558,1.640,0.087), which is located near the periphery of the région R, 
the maximum value of y is 13.021. By expressing the coordinates of this point 
in terms of the actual values of the three fertilizers we obtain = 2.733 
Ib/plot, X 2 = 2.944 Ib/plot, and ^3 = 2.513 Ib/plot. We conclude that a 
combination of nitrogen, phosphoric acid, and potash fertilizers at the rates 


Table 8.2. Ridge Analysis Values 


A 


■D 

0.979 

0.889 


■B 

0.784 

0.770 

0.754 

0.745 

0.740 




- 0.221 

- 0.269 



- 0.408 

- 0.453 

- 0.499 

- 0.544 

- 0.558 


0.102 

0.269 

0.438 

0.605 

0.771 

0.935 

1.099 

1.263 

1.426 

1.589 

1.640 

^3 

0.081 

0.110 

0.118 

0.120 

0.117 

0.113 

0.108 

0.102 

0.096 

0.089 

0.087 

r 

0.168 

0.337 

0.505 

0.673 

0.841 

1.009 

1.177 

1.346 

1.514 

1.682 

1.734 

A 

y 

10.575 

10.693 

10.841 

11.024 

11.243 

11.499 

11.790 

12.119 

12.484 

12.886 

13.021 




350 


OPTIMIZATION IN STATISTICS 


of 2.733, 2.944, and 2.513 Ib/plot, respectively, results in an estimated 
maximum yield of snap beans of 13.021 Ib/plot. 


8.3.3. Modified Ridge Analysis 

Optimization of y(x) on a hypersphere S by the method of ridge analysis is 
justified provided that the prédiction variance on S is relatively small. 
Furthermore, it is désirable that this variance remain constant on S. If not, 
then it is possible to obtain poor estimâtes of the optimum response, 
especially when the dispersion in the prédiction variances on S is large. Thus 
the reliability of ridge analysis as an optimum-seeking procedure dépends 
very much on controlling the size and variability of the prédiction variance. If 
the design is rotatable, then the prédiction variance, Var[y(x)], is constant on 
S. It is then easy to attain small prédiction variances by restricting the 
procedure to hyperspheres of small radii. However, if the design is not 
rotatable, then Var[y(x)] may vary widely on S, which, as was mentioned 
earlier, can adversely affect the quality of estimation of the optimum re- 
sponse. This suggests that the prédiction variance should be given serions 
considération in the strategy of ridge analysis if the design used is not 
rotatable. 

Khuri and Myers (1979) proposed a certain modification to the method of 
ridge analysis: one that optimizes y(x) subject to a particular constraint on 
the prédiction variance. The following is a description of their proposed 
modification: 

Consider model (8.12), which can be written as 

y(x) =f'(x )7 + e, (8.25) 

where 


y ^ i ‘ ’ Pk’ Pl2 1 ^ 13 ’ * * * ’ Pk-l,k’ Pu ’ ^22 ? • • • ? I^kk) * 


The predicted response is given by 

y(x)=f'(x) 7 , 

where y is the least-squares estimator of 7 , namely. 


(8.26) 


7 = (X'X)"A'y, (8.27) 

where X = [f(x^),f(x 2 ), . . . ,f(x„)]' with x- being the vector of design settings 
at the ith experimental run (i= 1 , 2 , ...,tî, where n is the number of runs 
used in the experiment), and y is the corresponding vector of n observations. 
Since 


Var(7)=(X'X)■V^ 


(8.28) 



OPTIMIZATION TECHNIQUES IN RESPONSE SURFACE METHODOLOGY 


351 


where is the error variance, then from équation (8.26), the prédiction 
variance is of the form 

Var[>î(x)] = o-2f'(x)(X'X)“^f(x). (8.29) 

The number of unknown parameters in model (8.25) is p = {k-\- ï)(k + 2)/2, 
where k is the number of input variables. Let V 2 ^ • • • ^ dénoté the 
eigenvalues of X'X. Then from équation (8.29) and Theorem 2.3.16 we hâve 

^2f'(x)f(x) ^^f'(x)f(x) 

< Var[ 3 ;(x)] < , 

V V ■ 

max mm 

where and are, respectively, the smallest and largest of the t^/s. 
This double inequality shows that the prédiction variance can be inflated if 
X'X has small eigenvalues. This occurs when the columns of X are multi- 
collinear (see, for example, Myers, 1990, pages 125-126 and Chapter 8). 
Now, by the spectral décomposition theorem (Theorem 2.3.10), X'X = VAV', 
where V is an orthogonal matrix of orthonormal eigenvectors of X'X and 
A = Diag(r'i, is a diagonal matrix of eigenvalues of X'X. Equation 

(8.29) can then be written as 


P [fTx)v,l 




(8.30) 


where \j is the ;th column of V (y = 1, 2, . . . , p). If we dénoté the éléments of 

* * * î * * * î * * * î then f'(x)v^' can 

be expressed as 


f '(x)v,. = ÜQj + x'j. + x'T.x, 7 = 1,2,...,/?, 


V ‘"oy 

where Tj = (v -^ ,, l>^,, . . . , -)' and 


(8.31) 


Ty = 


Ty’ ^2j-- 

^ny 


'kj- 

2^12 j 2^13 j 


U 


22 j 


2^23 j 


^^2kj 


2^/t-i /ty 


symmetric 


U 


kkj 


j f ? ^ ? • • • ? jP • 


We note that the form of f'(x)vy, as given by formula (8.31), is identical to 
that of a second-order model. Formula (8.30) can then be written as 


P (Vr,: + X'T,- + X'T-X) 

Var[Kx)| - .T» E ^ ^ ^ 


2 


y=i 


(8.32) 



352 


OPTIMIZATION IN STATISTICS 


As was noted earlier, small values of Vj (7 = 1, 2, , p) cause y(x) to hâve 
large variances. 

To reduce the size of the prédiction variance within the région explored by 
ridge analysis, we can consider putting constraints on the portion of Var[y(x)] 
that corresponds to It makes sense to optimize y(x) subject to the 
constraints 


x'x = r^, 


f^Om+x'T^+x'T„x|<g, 


(8.33) 

(8.34) 


where t^, and are the values of Vq, t, and T that correspond to 
Here, ^ is a positive constant chosen small enough to offset the small value 
of Khuri and Myers (1979) suggested that q be equal to the largest 
value taken by \vq^ +x'T^x| at the n design points. The rationale 

behind this rule of thumb is that the prédiction variance is smaller at the 
design points than at other points in the experimental région. 

The modification suggested by Khuri and Myers (1979) amounts to adding 
the constraint (8.34) to the usual procedure of ridge analysis. In this way, 
some control can be maintained on the size of prédiction variance during the 
optimization process. The mathematical algorithm needed for this con- 
strained optimization is based on a technique introduced by Myers and 
Carter (1973) for a dual response System in which a primary second-order 
response function is optimized subject to the condition that a constrained 
second-order response function takes on some specified or désirable values. 
Here, the primary response is y(x) and the constrained response is + 
x't^ +x'T^x. 

Myers and Carter’s (1973) procedure is based on the method of Lagrange 
multipliers, which uses the function 


L = + x'P + x'Bx — /x(LJom + + ^'T^x ~ oj) ~ A(x'x — r^). 


A A A 

where /3 q, P, and B are the same as in model (8.14), p and A are Lagrange 
multipliers, a> is such that \oj\ <q [see inequality (8.34)], and r is the radius 
of a hypersphere centered at the origin and contained inside a région of 
interest R. By differentiating L with respect to X 2 , . . . , and equating 
the dérivatives to zéro, we obtain 

(B-/xT„-AI,)x=i(/.T„-p). (8.35) 

As in the method of ridge analysis, to solve équation (8.35), values of p and 
A are chosen directly in such a way that the solution represents a point of 
maximum (or minimum) for y(x). Thus for a given value of /x, the matrix of 
second-order partial dérivatives of L, namely 2(B — /xT^ — AI^), is made 
négative definite [and hence a maximum of y(x) is achieved] by selecting A 



OPTIMIZATION TECHNIQUES IN RESPONSE SURFACE METHODOLOGY 


353 


A 

larger than the largest eigenvalue of B — /xT^. Values of A smaller than the 
smallest eigenvalue of B — /xT^ should be considered in order for y(x) to 
attain a minimum. It follows that for such an assignment of values for /x and 
A, the corresponding solution of équation (8.35) produces an optimum for y 
subject to a fixed r = and a fixed value of Vq^ + x't^ + x'T^x. 


Example 8.3.2. An attempt was made to design an experiment from 
which one could find conditions on concentration of three basic substances 
that maximize a certain mechanical modular property of a solid propellant. 
The initial intent was to construct and use a central composite design (see 
Khuri and Cornell, 1996, Section 4.5.3) for the three components in the 
System. However, certain experimental difficulties prohibited the use of the 
design as planned, and the design used led to problems with multicollinearity 
as far as the fitting of the second-order model is concerned. The design 
settings and corresponding response values are given in Table 8.3. 

In this example, the smallest eigenvalue of X'X is = 0.0321. Corre- 
spondingly, the values of t^, and T^ in inequality (8.34) are Vq^ = 
- 0.2935, = (0.0469, 0.4081, 0.4071)^ and 



0.1129 

0.0095 

0.2709 


0.0095 

-0.1382 

-0.0148 


0.2709 
-0.0148 . 
0.6453 


As for q in inequality (8.34), values of + x't^ + x'T^x| were computed 
at each of the 15 design points in Table 8.3. The largest value was found to 


Table 8.3. Design Settings and Response Values for Example 8.3.2 


Vl 

-^2 

^3 

y 

- 1.020 

- 1.402 

- 0.998 

13.5977 

0.900 

0.478 

- 0.818 

12.7838 

0.870 

- 1.282 

0.882 

16.2780 

- 0.950 

0.458 

0.972 

14.1678 

- 0.930 

- 1.242 

- 0.868 

9.2461 

0.750 

0.498 

- 0.618 

17.0167 

0.830 

- 1.092 

0.732 

13.4253 

- 0.950 

0.378 

0.832 

16.0967 

1.950 

- 0.462 

0.002 

14.5438 

- 2.150 

- 0.402 

- 0.038 

20.9534 

- 0.550 

0.058 

- 0.518 

11.0411 

- 0.450 

1.378 

0.182 

21.2088 

0.150 

1.208 

0.082 

25.5514 

0.100 

1.768 

- 0.008 

33.3793 

1.450 

- 0.342 

0.182 

15.4341 


Source: Khuri and Myers (1979). Reproduced with permission of the Ameri- 
can Statistical Association. 



354 


OPTIMIZATION IN STATISTICS 


Table 8.4. Results of Modified Ridge Analysis 


r 

0.848 

1.162 

1.530 

1.623 

1.795 

1.850 

1.904 

1.935 

2.000 

1 ^1 

0.006 

0.074 

0.136 

1.139 

0.048 

0.086 

0.126 

0.146 

0.165 

Var[y(x)]/o-^ 

1.170 

1.635 

2.922 

3.147 

1.305 

2.330 

3.510 

4.177 

5.336 


0.410 

0.563 

0.773 

0.785 

0.405 

0.601 

0.750 

0.820 

0.965 

^2 

0.737 

1.015 

1.320 

1.422 

1.752 

1.751 

1.750 

1.752 

1.752 

^3 

0.097 

0.063 

0.019 

0.011 

0.000 

0.000 

0.012 

0.015 

-0.030 

y(x) 

22.420 

27.780 

35.242 

37.190 

37.042 

40.222 

42.830 

44.110 

46.260 


Source: Khuri and Myers (1979). Reproduced with permission of the American Statistical 
Association. 


Table 8.5. Results of Standard Ridge Analysis 


r 

0.140 

0.379 

0.698 

0.938 

1.146 

1.394 

1.484 

1.744 

1.944 

1.975 

2.000 

1 ^1 

0.241 

0.124 

0.104 

0.337 

0.587 

0.942 

1.085 

1.553 

1.958 

2.025 

2.080 


1.804 

0.477 

0.337 

3.543 

10.718 

27.631 

36.641 

75.163 

119.371 

127.815 

134.735 

Var[j)(x)]/o-2 

2.592 

1.554 

2.104 

6.138 

14.305 

32.787 

42.475 

83.38 

129.834 

138.668 

145.907 

^1 

0.037 

0.152 

0.352 

0.515 

0.660 

0.835 

0.899 

1.085 

1.227 

1.249 

1.265 

^2 

0.103 

0.255 

0.422 

0.531 

0.618 

0.716 

0.749 

0.845 

0.916 

0.927 

0.936 

^3 

0.087 

0.235 

0.431 

0.577 

0.705 

0.858 

0.912 

1.074 

1.197 

1.217 

1.232 

y(x) 

12.796 

16.021 

21.365 

26.229 

31.086 

37.640 

40.197 

48.332 

55.147 

56.272 

57.176 


Source: Khuri and Myers (1979). Reproduced with permission of the American Statistical 
Association. 


be 0.087. Hence, the value of \vq^ + x'T^x| should not grow much 

larger than 0.09 in the experimental région. Furthermore, r in équation 
(8.33) must not exceed the value 2, since most of the design points are 
contained inside a sphere of radius 2. The results of maximizing y(x) subject 
to this dual constraint are given in Table 8.4. For the sake of comparison, the 
results of applying the standard procedure of ridge analysis (that is, without 
the additional constraint concerning are displayed in 

Table 8.5. 

It is clear from Tables 8.4 and 8.5 that the extra constraint concerning 
+ x't^ + x'T^x has profoundly improved the précision of y at the esti- 
mated maxima. At a specified radius, the value of y obtained under standard 
ridge analysis is higher than the one obtained under modified ridge analysis. 
However, the prédiction variance values under the latter procedure are much 
smaller, as can be seen from comparing Tables 8.4 and 8.5. While the 
tradeoff that exists between a high response value and a small prédiction 
variance is a bit difficult to cope with from a decision making standpoint, 
there is a clear superiority of the results displayed in Table 8.4. For example, 
one would hardly choose any operating conditions in Table 8.5 that indicate 
y > 50, due to the accompanying large prédiction variances. On the other 
hand. Table 8.4 reveals that at radius r= 2.000, y = 46.26 with Var[y(x)]/o-^ 









RESPONSE SURFACE DESIGNS 


355 


= 5.336, while a rival set of coordinates at r= 1.744 for standard ridge 
analysis gives y = 48.332 with Var[y(x)]/o-^ = 83.38. 

Row 3 of Table 8.5 gives values of which should be compared 

with the corresponding values in row 4 of the same table. One can easily see 
that in this example, accounts for a large portion of Var[y(x)]/o-^. 


8.4. RESPONSE SURFACE DESIGNS 

We recall from Section 8.3 that one of the objectives of response surface 
methodology is the sélection of a response surface design according to a 
certain optimality criterion. The design sélection entails the spécification of 
the settings of a group of input variables that can be used as experimental 
runs in a given experiment. 

The proper choice of a response surface design can hâve a profound effect 
on the success of a response surface exploration. To see this, let us suppose 
that the fitted model is linear of the form 

y = xp + e, (8.36) 

where y is an n X 1 vector of observations, X is an n Xp known matrix that 
dépends on the design settings, (î is a vector of p unknown parameters, and 
e is a vector of random errors in the éléments of y. Typically, e is assumed to 
hâve the normal distribution A^(0, o-^U), where is unknown. In this case, 
the vector P is estimated by the least-squares estimator p, which is given by 

p = (X'X)“A'y. (8.37) 

If Xi, X 2 , • • • , Xf^ are the input variables for the model under considération, 
then the predicted response at a point x = (x^, X 2 , . . . , x^)' in a région of 
interest R is written as 


j)(x)=f'(x)p, (8.38) 

where f(x) is a pXl vector whose first élément is equal to one and whose 
remaining p — I éléments are functions of x^, X 2 , . . . , x^. These functions are 
in the form of powers and cross products of powers of the x/s up to degree d. 
In this case, the model is said to be of order d. At the u\h experimental run, 
Xy = (x„i, x„i, . . . , x„^)' and the corresponding response value is = 
1,2, ... , n). Then nXk matrix D = [xp X 2 : *•* :x„]' is the design matrix. Thus 
by a choice of design we mean the spécification of the éléments of D. 

If model (8.36) is correct, then p is an unbiased estimator of P and its 
variance-covariance matrix is given by 


Var(p) = (X'X) 


(8.39) 



356 


OPTIMIZATION IN STATISTICS 


Hence, from formula (8.38), the prédiction variance can be written as 

Var[j)(x)] =o-2f'(x)(X'X)“^f(x). (8.40) 

The design D is rotatable if Var[y(x)] remains constant at ail points that are 
équidistant from the design center, as we may recall from Section 8.3. The 
input variables are coded so that the center of the design coincides with the 
origin of the coordinates System (see Khuri and Cornell, 1996, Section 2.8). 


8.4.1. First-Order Designs 

If model (8.36) is of the first order (that is, d = 1), then the matrix X is of the 
form X = [1„: D], where 1„ is a vector of ones of order n X 1 [see model 
(8.8)]. The input variables can be coded in such a way that the sum of the 
éléments in each column of D is equal to zéro. Consequently, the prédiction 
variance in formula (8.40) can be written as 


Var[y(x)] = 


1 -1 1 
- + x (D'D) X . 
n 


(8.41) 


Formula (8.41) clearly shows the dependence of the prédiction variance on 
the design matrix. 

A reasonable criterion for the choice of D is the minimization of Var[y(x)j, 
or equivalently, the minimization of x'(D'D)“^x within the région R. To 
accomplish this we first note that for any x in the région R, 


-1 


x'(D'D) x< 


X 


D'D) 


-1 


I2? 


(8.42) 


where ||xil 2 = (x'x)^/^ and ||(D'D) MI 2 = is the Euclidean 

norm of (D'D)-i with being its (/, 7 ‘)th element (Z,; = 1, 2, . . . , Æ). In- 
equality (8.42) follows from applying Theorems 2.3.16 and 2.3.20. Thus by 
choosing the design D so that it minimizes ||(D'D)“^|| 2 , the quantity 
x'(D'D)-'x, and hence the prédiction variance, can be reduced throughout 
the région R. 


Theorem 8.4.1. For a given number n of experimental runs, ||(D'D )“^||2 
attains its minimum if the columns di,d 2 ,...,d^ ofD are such that d' dy = 0, 
i ¥= j, and d' d^ is as large as possible inside the région R. 


Proof We hâve that D = [di:d 2 : *** :d^]. The éléments of d, are the n 
design settings of the input variable (/ = 1, 2, . . . , k). Suppose that the 
région R places the following restrictions on these settings: 


/= 1 , 2 , 



d'd,<cf. 


» » » 


(8.43) 



RESPONSE SURFACE DESIGNS 


357 


where is some fixed constant. This means that the spread of the design in 
the direction of the ith coordinate axis is bounded by cf (/ = 1, 2, . . . , k). 

□ 

Now, if d-ii dénotés the ith diagonal element of D'D, then = d' 

(/ = 1, 2, . . . , k). Furthermore, 


1 

/=1,2,...,Æ, (8.44) 

^ii 


where is the ith diagonal element of (D'D)“^ 

To prove inequality (8.44), let D^ be a matrix of order nX{k — 1) obtained 
from D by removing its ith column d^ (/ = 1, 2, ... , k). The cofactor of d^i in 
D'D is then det(D'D-). Hence, from Section 2.3.3, 


detÇD'D,) 

det(D'D) ’ 


7 = 19 k 

t' A. ^ ^ » » » ^ fV » 


(8.45) 


There exists an orthogonal matrix E, of order kxk (whose déterminant has 
an absolute value of one) such that the first column of DE, is d, and the 
remaining columns are the same as those of D-, that is, 

DE, = [d,: D,], / = 1, 2, . . . , Æ. 

It follows that [see property 7 in Section 2.3.3] 

det(D'D) = det(E'D'DE-) 

= det(D'D,) [d' d, - d'D,(D'D,) 'Dld, 


Hence, from (8.45) we obtain 



- 1 - 


^/,,-d'D,(D'D,) - o;d, 


1 -1 


7 = 19 k 

V -1- ^ ^ » » » ^ fV » 


(8.46) 


Inequality (8.44) now follows from formula (8.46), since d' D.(D'D,) ^D'd, > 0. 
We can therefore write 


(D'D) 


-1 


2 ^ 


U/2 




7=1 


> 


k 

E 

7 = 1 



lV2 



358 


OPTIMIZATION IN STATISTICS 


Using the restrictions (8.43) we then hâve 



Equality is achieved if the columns of D are orthogonal to one another and 
d-ii = cf (i = 1,2, . . . , k). This follows from the fact that = 1 /d^ if and only 
if d\I>i = 0(i=l,2,..., k), as can be seen from formula (8.46). 

Définition 8.4.1. A design for fitting a fist-order model is said to be 
orthogonal if its columns are orthogonal to one another. □ 

A 

Corollary 8.4.1. For a given number n of experimental runs, Var( fi) 

A 

attains a minimum if and only if the design is orthogonal, where is the 
least-squares estimator of fi in model (8.8), i = 1,2, . . . ,k. 

Proof This follows directly from Theorem 8.4.1 and the fact that Var(/3/) 
= a (i = 1,2, . . . , k), as can be seen from formula (8.39). □ 

From Theorem 8.4.1 and Corollary 8.4.1 we conclude that an orthogonal 
design for fitting a first-order model has optimal variance properties. An- 
other advantage of orthogonal first-order designs is that the effects of the k 
input variables in model (8.8), as measured by the values of the /3/s (/ = 
1,2, ... , k), can be estimated independently. This is because the off-diagonal 
éléments of the variance -covariance matrix of P in formula (8.39) are zéro. 

A 

This means that the éléments of P are uncorrelated and hence statistically 
independent under the assumption of normality of the random error vector e 
in model (8.36). 

Examples of first-order orthogonal designs are given in Khuri and Cornell 
(1996, Chapter 3). Prominent among these designs are the 2^ factorial design 
(each input variable has two levels, and the number of ail possible combina- 
tions of these levels is 2*^) and the Plackett-Burman design, which was 
introduced in Plackett and Burman (1946). In the latter design, the number 
of design points is equal to k+ 1, which must be a multiple of 4. 


8.4.2. Second-Order Designs 

These designs are used to fit second-order models of the form given by (8.12). 
Since the number of parameters in this model is p = {k-\- l)(k + 2)/2, the 
number of experimental runs (or design points) in a second-order design 
must at least be equal to p. The most frequently used second-order designs 
include the 3*^ design (each input variable has three levels, and the number of 
ail possible combinations of these levels is 3^), the central composite design 
(CCD), and the Box-Behnken design. 



RESPONSE SURFACE DESIGNS 


359 


The CCD was introduced by Box and Wilson (1951). It is made up of a 
factorial portion consisting of a 2^ factorial design, an axial portion of k 
pairs of points with the ith pair consisting of two symmetric points on the ith 
coordinate axis (i = 1,2, . . . , k) at a distance of a (>0) from the design 
center (which coincides with the center of the coordinates System by the 
coding scheme), and Hq (> 1) center-point runs. The values of a and tîq can 
be chosen so that the CCD acquires certain désirable features (see, for 
example, Khuri and Cornell, 1996, Section 4.5.3). In particular, if a = 
where F dénotés the number of points in the factorial portion, then the CCD 
is rotatable. The choice of Hq can affect the stability of the prédiction 
variance. 

The Box-Behnken design, introduced in Box and Behnken (1960), is a 
subset of a 3^ factorial design and, in general, requires many fewer points. It 
also compares favorably with the CCD. A thorough description of this design 
is given in Box and Draper (1987, Section 15.4). 

Other examples of second-order designs are given in Khuri and Cornell 
(1996, Chapter 4). 


8.4.3. Variance and Bias Design Criteria 

We hâve seen that the minimization of the prédiction variance represents an 
important criterion for the sélection of a response surface design. This 
criterion, however, présumés that the fitted model is correct. There are many 
situations in which bias in the predicted response can occur due to fitting the 
wrong model. We refer to this as model bias. 

Box and Draper (1959, 1963) presented convincing arguments in favor of 
recognizing bias as an important design criterion — in certain cases, even 
more important than the variance criterion. 

Consider again model (8.36). The response value at a point x = 
(x^, ^ 2 , . . . , Xf^y in a région R is represented as 


y(x)=f'(x)p + e, 


(8.47) 


where f '(x) is the same as in model (8.38). While it is hoped that model (8.47) 
is correct, there is always a fear that the true model is different. Let us 
therefore suppose that in reality the true mean response at x, denoted by 
7](x), is given by 


7]{x) =f'(x)(î + g'(x)ô 


(8.48) 


where the éléments of g'(x) dépend on x and consist of powers and cross 
Products of powers of x^, % 2 , . . . , of degree d' > d, with d being the order 
of model (8.47), and S is a vector of q unknown parameters. For a given 



360 


OPTIMIZATION IN STATISTICS 


design D of n experimental runs, we then hâve the model 

ifl = Xp + Z8, 

where tt] is the vector of true means (or expected values) of the éléments of y 
at the n design points, X is the same as in model (8.36), and Z is a matrix of 
order nXq whose u\h row is equal to g'(x„). Here, dénotés the wth row 
of D (m = 1,2, . . . , 7î). 

At each point x in i^, the mean squared error (MSE) of y(x), where yix) is 
the predicted response as given by formula (8.38), is defined as 

MSE[y(x)] =E[ÿ{x) - t?(x)]1 


This can be expressed as 


MSE[y(x)] = Var[y(x)] + Bias^[y(x)] , (8.49) 


where Bias[y(x) =E[y(x)] — ? 7 (x). The fundamental philosophy of Box and 
Draper (1959, 1963) is centered around the considération of the integrated 
mean squared error (IMSE) of y(x). This is denoted by J and is defined in 
terms of a Æ-tuple Riemann intégral over the région R, namely. 



(8.50) 


where 11 ^ = jj^dx and is the error variance. The partitioning of 
MSE[y(x)] as in formula (8.49) enables us to separate J into two parts: 


nfl r n(ï r 

J = — J / Var[y(x)] dx H ^ / Bias^[y(x)] dx = V B. (8.51) 

(T J R (T R 


The quantities V and B are called the average variance and average squared 
bias of y(x), respectively. Both V and B dépend on the design D. Thus a 
reasonable choice of design is one that minimizes (1) V alone, (2) B alone, or 
(3) J=V+B. 

Now, using formula (8.40), V can be written as 


F=nIî/'f'(x)(X'X) ^f{x)dx 



/r(X'X) 



n(X'X)"'rii], 


= tr 


(8.52) 



RESPONSE SURFACE DESIGNS 


361 


where 


Fil = n r f (x)F(x) dx. 

■^R 


As for B, we note from formula (8.37) that 


£(P) = (X'X)-A'ïi 
= p + AÔ, 


where A = (X'X) ^ X'Z. Thus from formula (8.38) we hâve 

£[j)(x)] =f'(x)(p + Aô). 

Using the expression for rjix) in formula (8.48), B can be written as 
B = — Y f + f'(x)A8 — f'(x)p — g'(x)ô]^ dx 

(T J R 




(T^ jR 


j [f'(x)A8 — g'(x)8]^ dx 


n^l 


(T^ ^ R 


f ô'[A'f(x) -g(x)] [f'(x)A-g'(x)]8r/x 

J D 


n^l 


jR 


n 


f ô' [A'f (x)f '(x)A - g(x)f '(x)A - A'f (x)g'(x) + g(x)g'(x)] 8 dx 

J D 


= — t8'A8, 


(T 


(8.53) 


where 


A = AT„A-r' A-AT,, + r„, 


ri 2 = n/' f(x)g'(x) dx, 

''R 

r 22 = ^f g(x)g'(x) dx. 

''R 

The matrices F^^, F^ 2 ? ^22 ^^e called région moments. By adding and 

subtracting the matrix F^ 2 Fj^^F ^2 A in formula (8.53), B can be 

expressed as 


^ + (A - rr/ri2)T4A- rr/ri2)]8. (8.54) 



362 


OPTIMIZATION IN STATISTICS 


We note that the design D affects only the second expression inside brackets 
on the right-hand side of formula (8.54). Thus to minimize B, the design D 
should be chosen such that 


A-rfi^ri2 = o. (8.55) 

Since A = (X'X)“^X'Z, a sufficient (but not necessary) condition for the 
minimization of B is 


Mil— Fil, (8.56) 

where Mu = (1 /tî)X'X, M 12 = (l//r)X'Z are the so-called design moments. 
Thus a sufficient condition for the minimization of B is the equality of the 
design moments, Mu and M 12 , to the corresponding région moments, Fn 
and Fi 2 . 

The minimization oi J =V-\- B is not possible without the spécification of 
8 / 0 -. Box and Draper (1959,1963) showed that unless V is considerably 
larger than B, the optimal design that minimizes J has characteristics similar 
to those of a design that minimizes just B. 

Examples of designs that minimize V alone or B alone can be found in 
Box and Draper (1987, Chapter 13), Khuri and Cornell (1996, Chapter 6 ), 
and Myers (1976, Chapter 9). 


8.5. ALPHABETIC OPTIMALITY OF DESIGNS 

Let us again consider model (8.47), which we now assume to be correct, that 
is, the true mean response, r]{x), is equal to f'(x)p. In this case, the matrix 
X'X plays an important rôle in the détermination of an optimal design, since 
the éléments of (X'X)“^ are proportional to the variances and covariances of 
the least-squares estimators of the modehs parameters [see formula (8.39)]. 

The mathematical theory of optimal designs, which was developed by 
Kiefer (1958, 1959, 1960, 1961, 1962a, b), is concerned with the choice of de- 
signs that minimize certain functions of the éléments of (X'X)“^ The kernel 
of Kiefer’s approach is based on the concept of design measure, which 
represents a generalization of the traditional design concept. So far, each of 
the designs that we hâve considered for fitting a response surface model has 
consisted of a set of n points in a Æ-dimensional space {k > 1). Suppose that 
Xi,X 2 ,...,x^ are distinct points of an n-point design (m<n) with the /th 
(/ = 1 , 2 , . . . , m) point being replicated Ui (> 1 ) times (that is, Ui repeated 
observations are taken at this point). The design can therefore be regarded as 
a collection of points in a région of interest R with the /th point being 
assigned the weight ni/n (/ = 1, 2, ... , m), where n = EJTi/î/. Kiefer general- 
ized this setup using the so-called continuons design measure, which is 



ALPHABETIC OPTIMALITY OF DESIGNS 


363 


basically a probability measure ^(x) defined on R and satisfies the conditions 



In particular, the measure induced by a traditional design D with n points is 
called a discrète design measure and is denoted by It should be noted that 
while a discrète design measure is realizable in practice, the same is not true 
of a general continuons design measure. For this reason, the former design is 
called exact and the latter design is called approximate. 

By définition, the moment matrix of a design measure ^ is a symmetric 
matrix of the form M(^) = [m,y(^)j, where 


= /y,(x)/y(x) d^{x). (8 

Here, f^ix) is the ith element of f(x) in formula (8.47), / = 1, 2, . . . , For a 
discrète design measure the (/,7)th element of the moment matrix is 

^ m 

^iji 4) = - E nifiixi)fj(x,), (8.58) 

^ 1=1 

where m is the number of distinct design points and Ui is the number of 
réplications at the /th point (/ = 1, 2, . . . , m). In this spécial case, the matrix 
M(^) reduces to the usual moment matrix (I/7r)X'X, where X is the same 
matrix as in formula (8.36). 

For a general design measure the standardized prédiction variance, 
denoted by d{x, ^), is defined as 

^/(x,f)=f'(x)[M(^)]-'f(x), (8.59) 

where M(^) is assumed to be nonsingular. In particular, for a discrète design 
measure the prédiction variance in formula (8.40) is equal to 
(a^/n)d(x, 4)- 

Let H dénoté the class of ail design measures defined on the région R. 
A prominent design criterion that has received a great deal of attention is 
that of Z)-optimality, in which the déterminant of M(^) is maximized. Thus a 
design measure is Z)-optimal if 

det[M(^^)] = supdet[M(^)]. (8.60) 

The rationale behind this criterion has to do with the minimization of the 

A 

generalized variance of the least-squares estimator (î of the parameter vector 

A 

p. By définition, the generalized variance of P is the same as the determi- 

A 

nant of the variance -covariance matrix of p. This is based on the fact that 




364 


OPTIMIZATION IN STATISTICS 


under the normality assumption, the content (volume) of a fixed-level confi- 
dence région on p is proportional to [det(X'X)]“^^^. The review articles by 
St. John and Draper (1975), Ash and Hedayat (1978), and Atkinson 
(1982, 1988) contain many references on _D-optimality. 

Another design criterion that is closely related to Z)-optimality is G- 
optimality, which is concerned with the prédiction variance. By définition, a 
design measure is G-optimal if it minimizes over H the maximum 
standardized prédiction variance over the région R, that is, 

supd(x,^g)= inf I sup<i(x, ^)|. (8.61) 

xeT? ''xeT? ^ 

Kiefer and Wolfowitz (1960) showed that Z)-optimality and G-optimality, as 
defined by formulas (8.60) and (8.61), are équivalent. Furthermore, a design 
measure ^ * is G-optimal (or Z>-optimal) if and only if 

supd(x, ^*) =p, (8.62) 

where p is the number of parameters in the model. Formula (8.62) can be 
conveniently used to détermine if a given design measure is Z>-optimal, since 
in general sup^ e r ^)>p for any design measure H. If equality can 
be achieved by a design measure, then it must be G-optimal, and hence 
Z)-optimal. 

Example 8.5.1. Consider fitting a second-order model in one input 
variable x over the région i^ = [ — 1, 1]. In this case, model (8.47) takes the 
form y(x) = Pq-\- + e, that is, f '(x) = (1, x, x^). Suppose that the 

design measure used is defined as 

= (8.63) 

\ 0 otherwise. 

Thus ^ is a discrète design measure that assigns one-third of the experimen- 
tal runs to each of the points —1, 0, and 1. This design measure is 
Z)-optimal. To verify this daim, we first need to détermine the values of the 
éléments of the moment matrix M(^). Using formula (8.58) with Ui/n = | 
for / = 1,2,3, we find that = 1, ^ 0? ^i 3 ^ 3 ? ^22 ^ 3 ? ^23 ^ 

77133 = |. Hence, 

'1 0 f 

M(0= 0 f 0 , 

2 0 7 

3^3 

" 30 - 3 ' 

M-i(^)= 0 § 0 . 

-3 0 I 



ALPHABETIC OPTIMALITY OF DESIGNS 


365 


By applying formula (8.59) we find that d{x, ^) = 3 — \x^ + \x^,— 1 <x < 1. 
We note that d{x, ^) < 3 for ail x in [ — 1, 1] with <i(0, ^) = 3. Thus 
d{x, ^) = 3. Since 3 is the number of parameters in the model, then 
by condition (8.62) we conclude that the design measure defined by formula 
(8.63) is Z)-optimal. 

In addition to the D- and G-optimality criteria, other variance-related 
design criteria hâve also been investigated. These include A- and E-optimal- 
ity. By définition, a design measure is -optimal if it maximizes the trace of 
M(^). This is équivalent to minimizing the sum of the variances of the 
least-squares estimators of the fitted modehs parameters. In E-optimality, 
the smallest eigenvalue of M(^) is maximized. The rationale behind this 
criterion is based on the fact that 


^(x, I) < 


fXx)f(x) 

^min 


as can be seen from formula (8.59), where is the smallest eigenvalue of 
M(^). Hence, d(x, ^) can be reduced by maximizing 

The efficiency of a design measure with respect to a Z)-optimal 

design is defined as 


i/p 


Z)-efficiency = 


det[M(^)] I 
sup^^^det[M(^)] j 


where p is the number of parameters in the model. Similarly, the G- 
efficiency of ^ is defined as 


G-efficiency = 


P 




Both D- and G-efficiency values fall within the interval [0, 1]. The doser 
these values are to I, the more efficient their corresponding designs are. 
Lucas (1976) compared several second-order designs (such as central compos- 
ite and Box-Behnken designs) on the basis of their D- and G-efficiency 
values. 

The équivalence theorem of Kiefer and Wolfowitz (I960) can be applied 
to construct a Z)-optimal design using a sequential procedure. This proce- 
dure is described in Wynn (1970, 1972) and goes as follows: Let dénoté an 
initial response surface design with Hq points, x^,X 2 , . . . ,x„^, for which the 
matrix X'X is nonsingular. A point x„^^^ is found in the région R such that 

d(x„„+i,4„)= supd(x, 

xei? 



366 


OPTIMIZATION IN STATISTICS 


where is the discrète design measure that represents By augmenting 
with ^na+i we obtain the design + Then, another point 
chosen such that 

^^(x„„ + 2> 4„ + l) = SUp^/(x, + 

xei? 

where discrète design measure that represents + The point 

is added to to obtain the design D„^+ 2 - continuing this 

process we obtain a sequence of discrète design measures, namely, 
^« 0 + 1 ’ ^« 0 + 2 ’ * * * * Wynn (1970) showed that this sequence converges to the 
Z>-optimal design that is, 

det[M(4^+„)] ^det[M(^rf)] 

as 7î ^ 00 . An example is given in Wynn (1970, Section 5) to illustrate this 
sequential procedure. 

The four design criteria. A-, Z>-, E-, and G-optimality, are referred to as 
alphabetic optimality. More detailed information about these criteria can be 
found in Atkinson (1982, 1988), Fedorov (1972), Pazman (1986), and Silvey 
(1980). Recall that to perform an actual experiment, one must use a discrète 
design. It is possible to find a discrète design measure that approximates 
an optimal design measure. The approximation is good whenever n is large 
with respect to p (the number of parameters in the model). 

Note that the équivalence theorem of Kiefer and Wolfowitz (1960) applies 
to general design measures and not necessarily to discrète design measures, 
that is, D- and G-optimality criteria are not équivalent for the class of 
discrète design measures. Optimal n-point discrète designs, however, can still 
be found on the basis of maximizing the déterminant of X'X, for example. In 
this case, finding an optimal n-point design requires a search involving nk 
variables, where k is the number of input variables. Several algorithms hâve 
been introduced for this purpose. For example, the DETMAX algorithm by 
Mitchell (1974) is used to maximize det(X'X). A review of algorithms for 
constructing optimal discrète designs can be found in Cook and Nachtsheim 
(1980) (see also Johnson and Nachtsheim, 1983). 

One important criticism of the alphabetic optimality approach is that it is 
set within a rigid framework governed by a set of assumptions. For example, 
a spécifie model for the response function must be assumed as the “true” 
model. Optimal design measures can be quite sensitive to this assumption. 
Box (1982) presented a critique to this approach. He argued that in a 
response surface situation, it may not be realistic to assume that a model 
such as (8.47) represents the true response function exactly. Some protection 
against bias in the model should therefore be considered when choosing a 
response surface design. On the other hand, Kiefer (1975) criticized certain 
aspects of the préoccupation with bias, pointing out examples in which the 
variance criterion is compromised for the sake of the bias criterion. It follows 



DESIGNS FOR NONLINEAR MODELS 


367 


that design sélection should be guided by more than one single criterion (see 
Kiefer, 1975, page 286; Box, 1982, Section 7). A reasonable approach is to 
select compromise designs that are sufficiently good (but not necessarily 
optimal) from the viewpoint of several criteria that are important to the user. 


8.6. DESIGNS FOR NONLINEAR MODELS 


The models we hâve considered so far in the area of response surface 
methodology were linear in the parameters; hence the term linear models. 
There are, however, many experimental situations in which linear models do 
not adequately represent the true mean response. For example, the growth 
of an organism is more appropriately depicted by a nonlinear model. By 
définition, a nonlinear model is one of the form 


y(x) =/z(x,0) + €, (8.64) 

where x = (x^ % 2 , . . . , x^)' is a vector of k input variables, 0 = ( 6 ^, ^ 2 ^ • • • ^ 
is a vector of p unknown parameters, e is a random error, and /z(x, 0) is a 
known function, nonlinear in at least one element of 0. An example of a 
nonlinear model is 


/ï(x, 0) 


O^x 

O 2 +x * 


Here, 0 = (^ 1 , 62 )' and O 2 is a nonlinear parameter. This particular model is 
known as the Michaelis-Menten model for enzyme kinetics. It relates the 
initial velocity of an enzymatic reaction to the substrate concentration x. 

In contrast to linear models, nonlinear models hâve not received a great 
deal of attention in response surface methodology, especially in the design 
area. The main design criterion for nonlinear models is the Z)-optimality 
criterion, which actually applies to a linearized form of the nonlinear model. 
More specifically, this criterion dépends on the assumption that in some 
neighborhood of a specified value 0 q of 0, the function /z(x, 0) is approxi- 
mately linear in 0. In this case, a first-order Taylor’s expansion of /z(x, 0) 
yields the following approximation of h{x, 0): 

^ f?/z(x, 0o) 

h{x, 0) = h{x, 0o) + 0; - 0 , 0 ) . 


Thus if 0 is close enough to 0q, then we hâve approximately the linear model 

^ é'/z(x, 0 q) 

z(x) = Ê 1 //,. — + e, (8.65) 

i=l 

where z(x) =y(x) — /z(x, 0 q), and i//^ is the ith element of i); = 0 — 0 q (i = 

1, 2, . . . , p)* 



368 


OPTIMIZATION IN STATISTICS 


For a given design consisting of n experimental runs, model (8.65) can be 
written in vector form as 


Z = H(0o)i|> + e , (8.66) 

where H(0o) is an /î X/7 matrix whose (w, Oth element is o'/ï(x„, with 

Xy being the vector of design settings for the k input variables at the u\h 
experimental run (z = 1, 2, . . . , p; w = 1, 2, . . . , n). Using the linearized form 
given by model (8.66), a design is chosen to maximize the déterminant 
det[H'(Oo)H(0o)]- This is known as the Box-Lucas criterion (see Box and 
Lucas, 1959). 

It can be easily seen that a nonlinear design obtained on the basis of the 
Box-Lucas criterion dépends on the value of 0 q. This is an undesirable 
characteristic of nonlinear models, since a design is supposed to be used for 
estimating the unknown parameter vector 0. By contrast, designs for linear 
models are not dépendent on the fitted modehs parameters. Several proce- 
dures hâve been proposed for dealing with the problem of design depen- 
dence on the parameters of a nonlinear model. These procedures are 
mentioned in the review article by Myers, Khuri, and Carter (1989). See also 
Khuri and Cornell (1996, Section 10.5). 

Example 8.6.1. Let us again consider the Michaelis-Menten model 
mentioned earlier. The partial dérivatives of h{x, 0) with respect to 0^ and 
O2 are 

(9/z(x,0) X 

dO-^ 62 X 

dh{x,^) —9^x 

Suppose that it is desired to find a two-point design that consists of the 
settings x^ and X2 using the Box-Lucas criterion. In this case. 


<?/!(Xi,0o) 

dh{x^,do) 

âO^ 

à02 


^/z(X2,0o) 

dh{x2,Qo) 

âO^ 

002 


Xi 

— O^qX^ 


O 2 Q +x^ 

( ^20 "^^ 1 ) 


^2 

H 

0 

1 

? 

^20 

( ^20 +^ 2 ) 



H(0o) = 



DESIGNS FOR NONLINEAR MODELS 


369 


where and ^20 éléments of 0q. In this example, H(0q) is a square 

matrix. Hence, 


det[H'(0o)H(eo)] ={det[H(0o)]}" 

_ 0wxjxl(x2-x^Ÿ 

{O2Q+X]) (020 +^2) 


(8.67) 


To détermine the maximum of this déterminant, let us first equate its partial 
dérivatives with respect to and X 2 to zéro. It can be verified that the 
solution of the resulting équations (that is, the stationary point) falls outside 
the région of feasible values for and X 2 (both x^ and X 2 must be 
nonnegative). Let us therefore restrict our search for the maximum within 
the région R = {(x^, X2)|0 <Xi 0 ^^2 where I^e maxi- 

mum allowable substrate concentration. Since the partial dérivatives of the 
déterminant in formula (8.67) do not vanish in R, then its maximum must be 
attained on the boundary of R. On x^ = 0, or X 2 = 0, the value of the 
déterminant is zéro. If Xi =x^,, , then 

i lllaA 

^l^Q^max^l(^2-^max)^ 

(^20"^^max) (^20 "^^ 2 ) 

It can be verified that this function of X 2 has a maximum at the point 
^2 ^20'^max /(2^20 "^^max) with a value given by 

max {det[H'(0o)H(0o)}= • (8 

^1 ^max 16^2 o( ^20 +^max) 




Similarly, if X 2 =^max^ 


det[H'(0o)H(0o)] 


^lVmax^?(^max-^l)^ 
(^20 "^^ 1 ) (^20"^^max) 


which attains the same maximum value as in formula (8.68) at the point 
Xi = ^ 2 o^max/(^^ 2 o + ^max)- conclude that the maximum of 

det[H'(0o)H(0o)] over the région R is achieved when x^=x^^^ and X 2 = 

^20^max/(2^20 + ^max X Or when Xi = ^20 ^max/G ^20 +^max) ^üd X2 


We can clearly see in this example the dependence of the design settings 
on O 2 , but not on 0^. This is attributed to the fact that 0^ appears linearly in 
the model, but O 2 does not. In this case, the model is said to be partially 
nonlinear. Its Z>-optimal design dépends only on those parameters that do 
not appear linearly. More details concerning partially nonlinear models can 
be found in Khuri and Cornell (1996, Section 10.5.3). 



370 


OPTIMIZATION IN STATISTICS 


8.7. MULTIRESPONSE OPTIMIZATION 

By définition, a multiresponse experiment is one in which a number of 
responses can be measured for each setting of a group of input variables. For 
example, in a skim milk extrusion process, the responses, y^= percent 
residual lactose and y 2 = percent ash, are known to dépend on the input 
variables, = pH level, X 2 = température, X 3 = concentration, and = time 
(see Fichtali, Van De Voort, and Khuri, 1990). 

As in single-response experiments, one of the objectives of a multire- 
sponse experiment is the détermination of conditions on the input variables 
that optimize the predicted responses. The définition of an optimum in a 
multiresponse situation, however, is more complex than in the single- 
response case. The reason for this is that when two or more response 
variables are considered simultaneously, the meaning of an optimum be- 
comes unclear, since there is no unique way to order the values of a 
multiresponse function. To overcome this difficulty, Khuri and Conlon (1981) 
introduced a multiresponse optimization technique called the generalized 
distance approach. The following is an outline of this approach: 

Let r be the number of responses, and n be the number of experimental 
runs for ail the responses. Suppose that these responses can be represented 
by the linear models 


y,. = XP; + e,., i=l,2,...,r, 

where is a vector of observations on the ith response, X is a known matrix 
of order nXp and rank /?, is a vector of p unknown parameters, and e, is 
a random error vector associated with the ith response (i= 1 , 2 , ... , r). It is 
assumed that the rows of the error matrix [ep € 3 : *•*: e^] are statistically 
independent with each having a zéro mean vector and a common 
variance -covariance matrix 2. Note that the matrix X is assumed to be the 
same for ail the responses. 

Let Xi, % 2 , “ ‘ , Xf^ be input variables that influence the r responses. The 
predicted response value at a point x = (x^, X 2 , . . . , x^)' in a région R for the 
ith response is given by y-(x) = f'(x)P-, where = (X'X)”^X'y- is the least- 
squares estimator of P, (/ = 1, 2, . . . , r). Here, f '(x) is of the same form as a 
row of X, except that it is evaluated at the point x. It follows that 

Var[j);(x)] = ( 7 ,.;f'(x)(X'X)“^f(x), i=l,2,...,r, 

Cov[j),.(x), j)^.(x)] = ( 7 ,./'(x)(X'X)“^f(x), i^j = 1,2,... ,r, 

where a-- is the (/, 7 ‘)th element of 2. The variance -covariance matrix of 
ÿ(x) = [y^Cx), ^ 2(^)5 • • • 5 is of form 

Var[ÿ(x)] =f'(x)(X'X)-'f(x)2. 



MULTIRESPONSE OPTIMIZATION 


371 


Since 2 is in general unknown, an unbiased estimator, 2, of 2 can be used 
instead, where 



1 

Y 

n —p 



X(X'X) A' 



A 

and Y = [y^ y 2 - '"-yrl- The matrix 2 is nonsingular provided that Y is of 
rank r <n—p. An estimate of Var[ÿ(x)] is then given by 


Var[ÿ(x)] =f'(x)(X'X)"'f(x)2. (8.69) 


Let 4>i dénoté the optimum value of y^Cx) optimized individually over the 
région R (/ = 1, 2, . . . , r). Let c|> = (/> 2 , . . . , These individual optima 

do not in general occur at the same location in R. To achieve a compromise 
optimum, we need to find x that minimizes p[ÿ(x), c|>], where p is some 
metric that measures the distance of ÿ(x) from cf>. One possible choice for p 
is the metric 


p[ÿ(x),c|>]= [ÿ(x) -c|>]'{Var[ÿ(x)]} ^[ÿ(x)-<})] 


1/2 


which, by formula (8.69), can be written as 




[ÿ(x) ^[ÿ(x) -<f)] 


f'(x)(X'X)-'f(x) 


1/2 


(8.70) 


We note that p = 0 if and only if ÿ(x) = c}>, that is, when ail the responses 
attain their individual optima at the same point; otherwise, p > 0. Such a 
point (if it exists) is called a point of idéal optimum. In general, an idéal 
optimum rarely exists. 

In order to hâve conditions that are as close as possible to an idéal 
optimum, we need to minimize p over the région R. Let us suppose that the 
minimum occurs at the point Xq^R. Then, at Xq the experimental conditions 
can be described as being near optimal for each of the r response functions. 
We therefore refer to Xq as a point of compromise optimum. 

Note that the éléments of <(> in formula (8.70) are random variables since 
they are the individual optima of y/x), y 2 (^)? • • • ? variation 

associated with <}> is large, then the metric p may not accurately measure the 
déviation of ÿ(x) from the true idéal optimum. In this case, some account 
should be taken of the randomness of <}> in the development of the metric. 
To do so, let Ç = ( ^ 1 , ^ 2 ? • • • ? 0'^ where is the optimum value of the true 
mean of the ith response optimized individually over the région R {i = 
1,2, ...,r). Let be a confidence région for Ç. For a fixed x^R and 
whenever Ç e , we obviously hâve 

p[Kx),Ç] < max p [y(x) , m ] . 


(8.71) 



372 


OPTIMIZATION IN STATISTICS 


The right-hand side of this inequality serves as an upper bound on p[ÿ(x), Ç], 
which represents the distance of ÿ(x) from the true idéal optimum. It follows 
that 


minp[y(x),^] 

xei? 


< min 

xei? 



(8.72) 


The right-hand side of this inequality provides a conservative measure of 
distance between the compromise and idéal optima. 

The confidence région can be determined in a variety of ways. Khuri 
and Conlon (1981) considered a rectangular confidence région of the form 


where 


yU = 4>i-gl{^i)MSy^ 

Jli = <Pi + 8i{ I;) ta/2,n-p > 


(8.73) 


where MS^ is the error mean square for the ith response, is the point at 
which j)-(x) attains the individual optimum n-p upper (a/2) X 

lOOth percentile of the ^-distribution with n —p degrees of freedom, and 
is given by 


g,(t.) = [f'(i)(X'X)-'f(|,)] 


1/2 


/ = 1 2 r 

t' A. ^ ^ » » » ^ / » 


Khuri and Conlon (1981) showed that such a rectangular confidence région 
has approximately a confidence coefficient of at least 1 — a*, where a* = 1 
-d-a)^ 

It should be noted that the évaluation of the right-hand side of inequality 
(8.72) requires that p[ÿ(x), ir]] be maximized first with respect to if| over 
for a given x^R. The maximum value thus obtained, being a function of x, is 
then minimized over the région R. A computer program for the implémenta- 
tion of this min-max procedure is described in Conlon and Khuri (1992). 
A complété electronic copy of the code, along with examples, can be 
downloaded from the Internet at ftp://ftp.stat.ufl.edU/pub/mr.tar.Z. 

Numerical examples that illustrate the application of the generalized 
distance approach for multiresponse optimization can be found in Khuri and 
Conlon (1981) and Khuri and Cornell (1996, Chapter 7). 


8.8. MAXIMUM LIKELIHOOD ESTIMATION 
AND THE EM ALGORITHM 

We recall from Section 7.11.2 that the maximum likelihood (ML) estimâtes of 
a set of parameters, for a given distribution maximize the 



MAXIMUM LIKELIHOOD ESTIMATION AND THE EM ALGORITHM 


373 


likelihood function of a sample, X 2 , . . . , of size n from the distribu- 

AA A 

tion. The ML estimâtes of the 0/s denoted by ^ 2 ? • • • ? t)e found by 

solving the likelihood équations (the likelihood function must be différen- 
tiable and unimodal) 


â log L(x,Ô) 


dO; 


= 0 , /= 1 , 2 ,...,/?, 


(8.74) 


where è = § 2 , . . . , 0^)' , x = (x^, X 2 , . . . , x„)', and L(x, 0) =/(x, 0) with 

/(x, 0) being the density function (or probability mass function) of X = 
(XpX 2 ,...,X„)'. Note that /(x, 0) can be written as nf=ig(Xj, 0), where 
gix, 0) is the density function (or probability mass function) associated with 
the distribution. 

Equations (8.74) may not hâve a closed-form solution. For example, 
consider the so-called truncated Poisson distribution whose probability mass 
function is of the form (see Everitt, 1987, page 29) 


8(x,0) 


e~^0^ 

(l-e“®)x! ’ 


X X ^ ^ ^ » » » 


(8.75) 


In this case. 


logL(x, 0) =log 


n 


Yl8(Xi,0) 

i = l 


n n 

= -n0+ (log 0) “ E logX;!-nlog(l 

i=l i=l 


Hence, 


dL*(x,0) 1« 

— = —n H — XX: — 

n I 


ne 


-B 


âO 


0 : 


i = l 


1— e 


-e ’ 


(8.76) 


where L*(x, ^) = log L(x, ^) is the log-likelihood function. The likelihood 
équation, which results from equating the right-hand side of formula (8.76) to 
zéro, has no closed-form solution for 0. 

In general, if équations (8.74) do not hâve a closed-form solution, then, as 
was seen in Section 8.1, itérative methods can be applied to maximize L(x, 0) 
[or L*(x, 0)]. Using, for example, the Newton- Raphson method (see Section 

A A 

8.1.2), if 0Q is an initial estimate of 0 and 0- is the estimate at the ith 
itération, then by applying formula (8.7) we hâve 


0,^i=Ô,-HZi(x,0,)VL-(x,0,), 


i = 0,1,2, 


? 


» » » 



374 


OPTIMIZATION IN STATISTICS 


where H^*(x, 0) and VL*(x, 0) are, respectively, the Hessian matrix and 
gradient vector of the log-likelihood function. Several itérations can be made 
until a certain convergence criterion is satisfied. A modification of this 
procedure is the so-called Fisher’s method of scoring, where is replaced 
by its expected value, that is. 


A 

0 


( + 1 


= 0 - 


£[H^*(x,ê;)]} \L*{x,èi), i = 0,l,2,.... (8.77) 


Here, the expected value is taken with respect to the given distribution. 


Example 8.8.1. (Everitt, 1987, pages 30-31). Consider the truncated 
Poisson distribution described in formula (8.75). In this case, since we only 
hâve one parameter 9, the gradient takes the form VL*(x, 0) = ^L*(x, 0)/â0, 
which is given by formula (8.76). Hence, the Hessian matrix is 


H^:i^(X, 0 ) = 


6»^L*(x, 0 ) 

dO^ 


1 ^ ne ^ 

^ . 


i = l 




Furthermore, if X dénotés the truncated Poisson random variable, then 


CO 


- 6 ûX 


E(X)= E 


xe-^e 


X 


(1—e ^)x! 


0 

1—e 


-e ' 


Thus 


£'[H^=i^(x, ^)] =E 


^^L*(x, 9) 

â9^ 


1 n9 ne ^ 

l-e-o ^ (i-e-9)' 

ne~^(l + 9) —n 


Suppose now we hâve the sample 1,2, 3, 4, 5,6 from this distribution. Let 
00=1.5118 be an initial estimate of 9. Several itérations are made by 
applying formula (8.77), and the results are shown in Table 8.6. The final 



MAXIMUM LIKELIHOOD ESTIMATION AND THE EM ALGORITHM 


375 


Table 8.6. Fisher’s Method of Scoring for the Truncated Poisson Distribution 


Itération 

âL^ 

d^L* 

e 


de 

de^ 

1 

-685.5137 

-1176.7632 

1.5118 

- 1545.5549 

2 

- 62.0889 

- 1696.2834 

0.9293 

- 1303.3340 

3 

-0.2822 

- 1750.5906 

0.8927 

-1302.1790 

4 

0.0012 

- 1750.8389 

0.8925 

-1302.1792 


Source: Everitt (1987, page 31). Reproduced with permission of Chapman and Hall, London. 


estimate of 0 is 0.8925, which is considered to be the maximum likelihood 
estimate of 0 for the given sample. The convergence criterion used here is 
l^/+i - êi\ <0.001. 

8.8.1. The EM Algorithm 

The EM algorithm is a general itérative procedure for maximum likelihood 
estimation in incomplète data problems. This encompasses situations involv- 
ing missing data, or when the actual data are viewed as forming a subset of a 
larger System of quantifies. 

The term EM was introduced by Dempster, Laird, and Rubin (1977). The 
reason for this terminology is that each itération in this algorithm consists of 
two steps called the expectation step (E-step) and the maximization step 
(M-step). In the E-step, the conditional expectations of the missing data are 
found given the observed data and the current estimâtes of the parameters. 
These expected values are then substituted for the missing data and used to 
complété the data. In the M-step, maximum likelihood estimation of the 
parameters is performed in the usual manner using the completed data. 
More generally, missing sufficient statistics can be estimated rather than the 
individual missing data. The estimated parameters are then used to reesti- 
mate the missing data (or missing sufficient statistics), which in turn lead to 
new parameter estimâtes. This defines an itérative procedure, which can be 
carried out until convergence is achieved. 

More details concerning the theory of the EM algorithm can be found in 
Dempster, Laird, and Rubin (1977), and in Little and Rubin (1987, Chapter 
7). The following two examples, given in the latter reference, illustrate the 
application of this algorithm: 

Example 8.8.2. (Little and Rubin, 1987, pages 130-131). Consider a 
sample of size n from a normal distribution with a mean /x and a variance 
o-^. Suppose that % 2 , . . . , are observed data and that 

+ • • • ? are missing data. Let ^ 2 ? • • • ? For i = m 

+ 1, m + 2, . . . , 7î, the expected value of X, given x^bs and 0 = ( /x, o-^)' is /x. 
Now, from Example 7.11.3, the log-likelihood function for the complété data 






376 


OPTIMIZATION IN STATISTICS 


set is 


L*(x,e) 


n 

2 


l0g(277O-^) 


1 

2(7^ 


n n 

YjXj -IjJüYjXi + niJ? , 


ï = 1 


i= 1 


/ 


(8.78) 


where x = (x^, X 2 , . . . , x„)'. We note that and Y2l=iX- are sufficient 

statistics. Therefore, to apply the E-step of the algorithm, we only hâve to 
find the conditional expectations of these statistics given x^^s and the current 
estimate of 6. We thus hâve 


n 


E 




j ’ ^ obs 


/ = 1 


\ m 

= Dx,. + (n-m)fij, 

i = l 


n 


E 


m 


Dx 2 | 0 .,x„bJ = J^xf+{n-m)[fij + â-j^), 
i=l I i=l 


7 = 0 , 1 , 2 ,..., (8.79) 
7 = 0,1, 2,..., (8.80) 


A ^ A 

where 0y = ( fij, âj Y is the estimate of 0 at the yth itération with 0 q being an 
initial estimate. 

From Section 7.11 we recall that the maximum likelihood estimâtes of /x 
and based on the complété data set are (l//r)E"=iXj and (l/n)'L'l=ixf — 
[(1/tî)E”=ixJ^. Thus in the M-step, these same expressions are used, except 
that the current expectations of the sufficient statistics in formulas (8.79) and 
(8.80) are substituted for the missing data portion of the sufficient statistics. 
In other words, the estimâtes of /x and at the (j + l)th itération are given 
by 


/^/+i = 


6-2 


y+i 


1 

n 

1 

n 


m 


'Exi + {n-m)fij 

i = l 


m 


J^xf + {n-m)[jlj+o-j 


i = l 


7 = 0, 1,2,..., (8.81) 


-ALi, 7 = 0,1, 2,.... (8.82) 


By setting jlj = Ay+i = A = <5" in équations (8.81) and (8.82), 

we find that the itérations converge to 



which are the maximum likelihood estimâtes of /x and from x^^^ 



MAXIMUM LIKELIHOOD ESTIMATION AND THE EM ALGORITHM 


377 


The EM algorithm is unnecessary in this example, since the maximum 
likelihood estimâtes of /x and a ^ can be obtained explicitly. 

Example 8.8.3. (Little and Rubin, 1987, pages 131-132). This example 
was originally given in Dempster, Laird, and Rubin (1977). It involves a 
multinomial x = X 2 , x^)' with cell probabilities 

where 0 < 0<1. Suppose that the observed data consist of = (38, 34, 125)' 
such that = 38, X 2 = 34, X 3 +X 4 = 125. The likelihood function for the 
complété data is 


(Xi +X9 +X:, +X,)! , , V. , V, , r, , X. 

Mx, 0 ) = ^ ü - (i«) (h 




The log-likelihood function is of the form 


L*(x, e)=iog 


(x^ +X2 +X3 +X4) ! 


+xi iog(è-i0) 


+ X 2 log(ï0) +X 3 log(i0) +X 4 log(f). 

By differentiating L*(x, 0) with respect to 0 and equating the dérivative to 
zéro we obtain 


X 


X 2 X 2 


1-0 


+ — + — — 0 . 

0 0 


Hence, the maximum likelihood estimate of 0 for the complété data set is 


0 = 


x^ +x- 


Xi +^2 +X- 


(8.83) 


Let us now find the conditional expectations of X 2 , X^, given the 
observed data and the current estimate of 0: 




A 

Oi, 

^obs ) 

= 38, 

E( 

:^2i 

A 

Oi, 

^obs ) 

= 34, 

E{ 

;xsi 

A 

Oi, 

^obs ) 

125 (iê,) 

1 ^ ^ ’ 





2 + 4^i 

E{ 

;^4i 

A 

Oi, 

^obs ) 

_ 125(è) 

1 ^ ^ ' 





2 + 4^1 



378 


OPTIMIZATION IN STATISTICS 


Table 8.7. The EM Algorithm for Example 8.8.3 


Itération 

A 

0 

0 

0.500000000 

1 

0.608247423 

2 

0.624321051 

3 

0.626488879 

4 

0.626777323 

5 

0.626815632 

6 

0.626820719 

7 

0.626821395 

8 

0.626821484 


Source: Little and Rubin (1987, page 132). Reproduced 
with permission of John Wiley & Sons, Inc. 


Thus at the (/ + l)st itération we hâve 


^i + l ~ 


34 + (125)(^ê,)/(i + ^ê,) 

38 + 34 + (125)(iê,)/(i + iê,)’ 


(8.84) 


as can be seen from applying formula (8.83) using the conditional expectation 
of X 3 instead of X 3 . Formula (8.84) can be used iteratively to obtain the 
maximum likelihood estimate of 6 on the basis of the observed data. Using 
an initial estimate èg = f , the results of this itérative procedure are given in 
Table 8.7. Note that if we set = 0 in formula (8.84) we obtain the 

quadratic équation, 

197 ^ 2 - 150-68 = 0 


whose only positive root is 0 = 0.626821498, which is very close to the value 
obtained in the last itération in Table 8.7. 


8.9. MINIMUM NORM QUADRATIC UNBAISED ESTIMATION 
OF VARIANCE COMPONENTS 

Consider the linear model 


c 

y = Xa+ Eu,. P,, (8.85) 

i = l 

where y is a vector of n observations; a is a vector of fixed effects; 
Plî P2î * * * î Pc are vectors of random effects; X, Ui,U 2 , . . . , U^ are known 
matrices of constants with = e, the vector of random errors; and U^ = I„. 
We assume that the p/s are uncorrelated with zéro mean vectors and 
variance -covariance matrices ^ where is the number of columns of 



MINIMUM NORM QUADRATIC UNBIASED ESTIMATION OF VARIANCE 


379 


(i= 1, 2, , c). The variances • • • , are referred to as variance 

components. Model (8.85) can be written as 

y = Xa + Up, (8.86) 

where U = [Uii U 2 : •** : U^], p = (p\, ^' 2 , . • . , Pc)'- From model (8.86) we hâve 


E(y) =Xa, 
Var(y) = ^ 

i = l 


(8.87) 


with X = u,u;. 

Let us consider the estimation of a linear fonction of the variance 
components, namely, where the a/s are known constants, by a 

quadratic estimator of the form y'Ay. Here, A is a symmetric matrix to be 
determined so that y'Ay satisfies certain criteria, which are the following: 

1. Translation Invariance. If instead of ol we consider 7 = a — a g? iFen 
from model (8.86) we hâve 

y-Xag = X7 + Up. 

In this case, is estimated by (y — Xag)'A(y — Xctg). The 

estimator y'Ay is said to be translation invariant if 

y'Ay= (y-Xao)'A(y-X«o). 

In order for this to be true we must hâve 


AX = 0. 


( 8 . 88 ) 


2. Unbiasedness . E(y'Ay) = Using a resuit in Searle (1971, The- 

orem I, page 55), the expected value of the quadratic form y'Ay is given 
by 

E(y'Ay) = a'X'AXa + tr[AVar(y)] , (8.89) 

since E(y) = Xa. From formulas (8.87), (8.88), and (8.89) we then hâve 

c 

£(y'Ay)= E‘^,'tr(AV,.). (8.90) 

i = l 

By comparison with the condition for unbiasedness is 


1 = 1 , 2 , 


? 


c. 


üi = tr(AV,.) , 


» » » 


(8.91) 



380 


OPTIMIZATION IN STATISTICS 


3. Minimum Norm. If Plî P2î * * * î Pc in model (8.85) were observable, then 
a natural unbaised estimator of would be 

since ECp' P,) = tr(I^ o-,^) = / = 1, 2, . . . , c. This estimator can be 

written as p'Ap, where A is the block-diagonal matrix 


A = DiagI 


a 


1 


a 


a. 


m, ’ m-, ? • • • ? *-m. 

m^ ^ m2 m^ ^ 


/ 


The différence between this estimator and y'Ay is 

yAy-p'Ap = p (U AU-A)p, 

since AX = 0. This différence can be made small by minimizing the 
Euclidean norm ||U'AU — AII 2 . 

The quadratic estimator y'Ay is said to be a minimum norm quadratic 
unbiased estimator (MINQUE) of if the matrix A is deter- 

mined so that ||U'AU — AII 2 attains a minimum subject to the condi- 
tions given in formulas (8.88) and (8.91). Such an estimator was intro- 
duced by Rao (1971,1972). 

The minimization of ||U'AU — AII 2 is équivalent to that of tr(AVAV), 
where V = The reason for this is the following: 

IIU'AU - Alli = tr[(U'AU - A)(U'AU - A)] 

= tr(U'AUU'AU) -2tr(U'AUA) +tr(A2). (8.92) 


Now, 


tr(U'AUA) = tr(AUAU') 


= tr 


ü: 


aEu,— U. u; 


\ / = ! 


= tr 


L — Au,u; 


/ = ! 


= tr 




X — Ay. 


\, = i 


X -tr(Ay.) 


i=i 


^ af 

X-, 

i=l 


= tr( A^). 


by (8.91) 



MINIMUM NORM QUADRATIC UNBIASED ESTIMATION OF VARIANCE 


381 


Formula (8.92) can then be written as 

IIU'AU - A||i = tr(U'AUU'AU) - tr( A^) 

= tr(AVAV) — tr( A^) , 

since V = = UU\ The trace of A^ does not involve A; 

hence the problem of MINQUE reduces to finding A that minimizes 
tr(AVAV) subject to conditions (8.88) and (8.91). Rao (1971) showed 
that the solution to this optimization problem is of the form 

c 

A=I;À;RV,.R, (8.93) 

i=l 


where 


R = 

with (X'V“^X)“ being a generalized inverse of X'V“^X, and the A/s 
are obtained from solving the équations 

c 

i: A,tr(RXRV^)=a,-, j=l,2,...,c, 

i = l 

which can be expressed as 


-V^X(X'V^X) X'V^ 


\'S = a^ (8.94) 

where X = (A^, A 2 , . . . , A^)', S is the cXc matrix (s^j) with s^j = 
tr(RA^RVy), and a = (a^, «2, . . . , The MINQUE of can 

then be written as 



i=\ 


y= EA,y'RXRy 

i = l 


= \ q, 


where q = ^ 2 ? • • • ? with = y'Ry Ry (i = 1, 2, . . . , c). But, from 

formula (8.94), X' = a'S“, where S“ is a generalized inverse of S. 
Hence, X'q = a'S“q = a'd-, where à = (â^, & 2 , . . . , is a solution 
of the équation 

Sd = q. (8.95) 


This équation has a unique solution if and only if the individual 
variance components are unbiasedly estimable (see Rao, 1972, page 



382 


OPTIMIZATION IN STATISTICS 


114). Thus the MINQUEs of the are obtained from solving 

équation (8.95). 

If the random effects in model (8.85) are assumed to be normally 
distributed, then the MINQUEs of the variance components reduce to 
the so-called minimum variance quadratic unbiased estimators 
(MIVQUEs). An example that shows how to compute these estimators 
in the case of a random one-way classification model is given in 
Swallow and Searle (1978). See also Milliken and Johnson (1984, 
Chapter 19). 


8.10. SCHEFFE’S CONFIDENCE INTERVALS 
Consider the linear model 


y = Xp + e, (8.96) 

where y is a vector of n observations, X is a known matrix of order nXp and 
rand r (</?), p is a vector of unknown parameters, and e is a random error 
vector. It is assumed that e has the normal distribution with a mean 0 and a 
variance-covariance matrix Let ï//=a'P be an estimable linear func- 

tion of the éléments of p. By this we mean that there exists a linear function 
t'y of y such that £'(t'y) = i//, where t is some constant vector. A necessary 
and sufficient condition for if/ to be estimable is that a' belongs to the row 
space of X, that is, a' is a linear combination of the rows of X (see, for 
example, Searle, 1971, page 181). Since the rank of X is r, the row space of X, 
denoted by p(X), is an r-dimensional subspace of the p-dimensional 
Euclidean space R^. Thus a'p estimable if and only if a' e p(X). 

Suppose that a' is an arbitrary vector in a ^-dimensional subspace ^ of 
p(X), where q<r. Then a'p = a'(X'X)“X'y is the best linear unbiased 
estimator of a'P, and its variance is given by 

Var(a'p) = o-^a'(X'X)“a, 

where (X'X)“ is a generalized inverse of X'X (see, for example, Searle, 1971, 
pages 181-182). Both a'P and a'(X'X)“a are invariant to the choice of 
(X'X)“, since a'P is estimable (see, for example, Searle, 1971, page 181). In 
particular, if r=p, then X'X is of full rank and (X'X)“= (X'X)“^ 

Theorem 8.10.1. Simultaneous (1 — a)100% confidence intervals on a'P 
for ail a' where is a ^-dimensional subspace of p(X), are of the form 


(8.97) 



383 


SCHEFFÉ’S CONFIDENCE INTERVALS 

where ^he upper alOOth percentile of the F-distribution with q 

and n—r degrees of freedom, and MS^ is the error mean square given by 

1 r - 1 

MSe = y' I„ - X(X'X) X' y. (8.98) 

In Theorem 8.10.1, the word “simultaneous” means that with probability 
1 — a, the values of a'p for ail a' satisfy the double inequality 

a'p - 

< a'p < a'p + {qMSe F„,^_„_,)'/"[a'(X'X)■a]'^^ (8.99) 

A proof of this theorem is given in Scheffé (1959, Section 3.5). Another proof 
is presented here using the method of Lagrange multipliers. This proof is 
based on the following lemma: 

Lemma 8.10.1. Let C be the set {x eF^lx'Ax < 1}, where A is a positive 
definite matrix of order q Xq. Then x e C if and only if ll'xl < (LA“^1)^'^^ 
for ail 1 

Proof. Suppose that x e C. Since A is positive definite, the boundary of C 
is an ellipsoid in a ^-dimensional space. For any 1 let e be a unit vector 
in its direction. The projection of x on an axis in the direction of 1 is given by 
e'x. Consider optimizing e'x with respect to x over the set C. The minimum 
and maximum values of e'x are obviously determined by the end points of the 
projection of C on the 1-axis. This is équivalent to optimizing e'x subject to 
the constraint x'Ax = 1, since the projection of C on the 1-axis is the same as 
the projection of its boundary, the ellipsoid x'Ax = 1. This constrained 
optimization problem can be solved by using the method of Lagrange 
multipliers. 

Let G = e'x + A(x'Ax — 1), where A is a Lagrange multiplier. By differenti- 
ating G with respect to % 2 , . . . , where x^ is the ith element of x 
(i= 1, 2, ... , q), and equating the dérivatives to zéro, we obtain the équation 
e + 2AAx = 0, whose solution is x = — (l/2A)A“^e. If we substitute this value 
of X into the équation x'Ax = 1 and then solve for A, we obtain the two 
solutions A^ = — |(e'A“^e)^'^^, A 2 = f(e'A“^e)^^^. But, e'x= — 2A, since x'Ax 
= 1. It follows that the minimum and maximum values of e'x under the 
constraint x'Ax= 1 are — (e'A“^e)^^^ and (e'A“^e)^^^, respectively. Hence, 

le'xl < (e'A-le)^''^ (8.100) 

Since l = ||l|| 2 e, where HIH 2 is the Euclidean norm of 1, multiplying the two 



384 


OPTIMIZATION IN STATISTICS 


sides of inequality (8.100) by HIH 2 yields 

U'xl < (8.101) 

Vice versa, if inequality (8.101) is true for ail then by choosing 

r = x'A we obtain 

Ix'AxI < (x'AA-lAx)^''^ 

which is équivalent to x'Ax < 1, that is, x e C. □ 

Proof of Theorem 8.10.1. Let L be a qXp matrix of rank q whose rows 
form a basis for the ^-dimensional subspace ^ of p(X). Since y in model 
(8.96) is distributed as V(Xp, o-^Ij^), Lp = L(X'X)“X'y is distributed as 
V[Lp, o-^L(X'X)“L']. Thus the random variable 

[l(p-p)]'[l(x'x)-l'|~‘[l(p-p)| 

qMSc 

has the F-distribution with q and n — r degrees of freedom (see, for example, 
Searle, 1971, page 190). It follows that 

= ( 8 . 102 ) 

A 

By applying Lemma 8.10.1 to formula (8.102) with x = L(p — p) and A = 
[L(X'X)~V]~^ /(qMS^ we obtain the équivalent probability state- 

ment 

P{|1'L(P - p)| < [l'L(X'X)"L'l]'^' = 1 - «. 

Let a' = LL. We then hâve 

P{|a'(p - P) I < (qMS^ P„,^,„_,)'/"[a'(X'X) Va' = 1 - 

We conclude that the values of a'P satisfy the double inequality (8.99) for ail 
a' with probability 1 — a. Simultaneous (1 — a)100% confidence inter- 
vals on a'P are therefore given by formula (8.97). We refer to these intervals 
as Scheffé’s confidence intervals. 

Theorem 8.10.1 can be used to obtain simultaneous confidence intervals 
on ail contrasts among the éléments of p. By définition, the linear function 
a'P is a contrast among the éléments of p if = 0, where a- is the ith 

element of a (/ = 1,2, ...,/?). If a' is in the row space of X, then it must 
belong to a ^-dimensional subspace of p(X), where q = r—l. Hence, simul- 



385 


SCHEFFÉ’S CONFIDENCE INTERVALS 

taneous (1 — a)100% confidence intervals on ail such contrasts can be 
obtained from formula (8.97) by replacing q with r — 1. □ 


8.10.1. The Relation of Scheffé’s Confidence Intervals to the F-Test 

There is a relationship between the confidence intervals (8.97) and the F-test 
used to test the hypothesis L(î = 0 versus Lp A 0, where L is the 
matrix whose rows form a basis for the ^-dimensional subspace ^ of p(X). 
The test statistic for testing Hq is given by (see Searle, 1971, Section 5.5) 



P'L'[l(X'X) L' 



? 


which under Hq has the F-distribution with q and n — r degrees of freedom. 
The hypothesis Hq can be rejected at the a-level of significance if F > 
^a,q,n-r‘ ^^is case, by Lemma 8.10.1, there exits at least one 1 such 
that 


l'Lpl > 


1/2 


(8.103) 


It follows that the F-test rejects Hq if and only if there exists a linear 
combination a'P, where a' = TL for some 1 for which the confidence 
interval in formula (8.97) does not contain the value zéro. In this case, a'P is 
said to be significantly different from zéro. 

It is easy to see that inequality (8.103) holds for some 1 eF^ if and only if 


sup 


H'Lpl^ 

1 L(X X) L'l 


>qMS^F^ 


a ,q,n—r 


? 


or equivalently. 


where 


TGil 

T77T-,>ciMSeF^ g 
\fER1 * ^2* 


Gi = Lpp'L', 


G2 = L(X'X) L'. 


(8.104) 


(8.105) 

(8.106) 


However, by Theorem 2.3.17, 


1 2 * 

leTj? 1 ^2* 


= p'l'[l(x'x) 



(8.107) 



386 


OPTIMIZATION IN STATISTICS 


where ^niax(G 2 the largest eigenvalue of G 2 The second equality 

in (8.107) is true because the nonzero eigenvalues of [L(X'X)“L']“^Lp(îX' 
are the same as those of p'L'[L(X'X)“L']“^Lp by Theorem 2.3.9. Note that 
the latter expression is the numerator sum of squares of the F-test statistic 
for Hq. 

The eigenvector of G^^G^ corresponding to spécial 

interest. Let 1* be such an eigenvector. Then 

1 *^G 1 * 

(8.108) 

This follows from the fact that \* satisfies the équation 

(Gi-e_G2)F = 0, 

where is an abbreviation for c^^^CG^^G^). It is easy to see that 1 * can be 

1 ^ 

chosen to be the vector G 2 Lp, since 

G 2 1Gi(G 2 iLp) = G2-iLpp'L'(G2-iLp) 

= (p'L'G2-iLp)G2-iLp 

= ^maxC'2 ^Lp. 

1 ^ 1 

This shows that G 7 is an eigenvector of G 2 G^ for the eigenvalue 

From inequality (8.104) and formula (8.108) we conclude that if the F-test 
rejects Hq at the a-level, then 

II-LPI > 

A 

This means that the linear combination a*'^, where is signifi- 

cantly different from zéro. Let us express a*'p as 

p'Lp= (8.109) 

/ = ! 

A 

where If and % are the ith éléments of 1 * and y = LP, respectively 
(i= 1,2,...,^). If we divide % by its estimated standard error [which is 
equal to the square root of the ith diagonal element of the variance -covari- 
ance matrix of Lp, namely, o-^L(X'X)“L' with replaced by the error 
mean square MS^ in formula (8.98)], then formula (8.109) can be written as 

LXp= E/fK-T,-, ( 8 . 110 ) 

i = l 

where = y J Consequently, large values of | If \ k, identify 

those éléments of 7 that are influential contributors to the significance of 



387 


SCHEFFÉ’S CONFIDENCE INTERVALS 

the F- test concerning //q. Note that the éléments of 7 = Lp form a set of 
linearly independent estimable linear functions of p. 

We conclude from the previous arguments that the eigenvector 1*, which 
corresponds to the largest eigenvalue of G 2 can be conveniently used to 
identify an estimable linear function of P that is significantly different from 
zéro whenever the F- test rejects Hq. 

It should be noted that if model (8.96) is a response surface model (in this 
case, the matrix X in the model is of full column rank, that is, r=p) whose 
input variables, X 2 , . . . , hâve different units of measurement, then 
these variables must be made scale free. This is accomplished as follows: If 
x„^ dénotés the u\h measurement on x^, then we may consider the transfor- 
mation 



X 


Ul 


-Xi 



I I , ^ , . . . , I , ^ ? * * 



where x^ = ( 1 /tî)E” = iX„p = [E” = i(x„j and n is the total number 

of observations. One advantage of this scaling convention, besides making the 
input variables scale free, is that it can greatly improve the conditioning of 
the matrix X with regard to multicollinearity (see, for example, Belsley, Kuh, 
and Welsch, 1980, pages 183-185). 

Example 8.10.1. Let us consider the one-way classification model 

y ij fx (X- €- , i l,2,...,7?z,y 1,2,...,7 Zj-, (8.111) 

where /x and a- are unknown parameters with the latter representing the 
effect of the /th level of a certain factor at m levels; observations are 

obtained at the ith level. The e^y’s are random errors assumed to be 
independent and normally distributed with zéro means and a common 
variance 

Model (8.111) can be represented in vector form as model (8.96). Here, 

y ~ ^yiv 3^12’ * * * î yin-i^ yiv yii^ * * * ? yin 2 ^ * * * ? ymv ymi^ * * * ? ymn^ ? P ~ 

( /X, « 2 ^ • • • ^ ^ is of order n X (m + 1) of the form X = [1„ : T], 

where 1„ is a vector of ones of order nXl, n = Y/lL\ni, and T = 
Diag(l„^, 1„^,...,1„^). The rank of X is r = m. For such a model, the 
hypothesis of interest is 




0 


«1 = «2 



which can be expressed as Hq. Lp = 0, where L is a matrix of order 



388 


OPTIMIZATION IN STATISTICS 


(m — 1) X (m + 1) and rank m — 1 ot the form 


0 1-1 0 ••• 0 
0 1 0 -1 ••• 0 


0 1 0 0 ••• -1 


This hypothesis States that the factor under considération has no effect on 
the response. Note that each row of L is a linear combination of the rows of 
X. For example, the ith row of L — whose éléments are equal to zéro except 
for the second and the (i + 2 )th éléments, which are equal to 1 and — 1 , 
respectively — is the différence between rows 1 and +1 of X, where 
Vi = Y!j=iU^, I = 1,2, . . . , m — 1. Thus the rows of L form a basis for a 
^-dimensional subspace ^ of p(X), the row space of X, where q = m — 1. 

Let iJi- = a-. Then /x- is the mean of the ith level of the factor 

(/ = 1, 2, . . . , m). Consider the contrast ij/= EJliC - /x-, that is, - = 0. We 
can Write i//=a'p, where a' = (0, C 2 , . . . , belongs to a ^-dimensional 

subspace of This subspace is the same as since each row of L is of 

the form (0, C 2 , . . . , c^) with = 0. Vice versa, iî ijj= a'p is such that 

a' = ( 0 , c^, C 2 , . . . , c^) with = 0 ? l^en a' can be expressed as 

a =( C2, C3,..., c^)L, 


since Hence, a' It follows that is a subspace associ- 

ated with ail contrasts among the means /x^ /X 2 , . . . , /x^ of the m levels of the 
factor. 

Simultaneous (1 — a)100% confidence intervals on ail contrasts of the 
form t/f = EJl 1 Cj /X, can be obtained by applying formula (8.97). Here, q = 
m — 1, r = m, and a generalized inverse of X'X is of the form 


(X'X)- = 


0 

0 


0 ' 

D 


? 


where D = Diag(7î^ 7^2 \ • • • , 0 is a zéro vector of order m X 1. 

Hence, 


a'p = (0,Ci,C2, 


m 


= Lcji., 

i = l 


» » » 


,c^)(X'X) X'y 



SCHEFFÉ’S CONFIDENCE INTERVALS 


389 


where 3^/.= i= 1,2,.. .,m. Furthermore, 

c? 


m -2 


a'(X'X)“a= E 


i=i ni 


By making the substitution in formula (8.97) we obtain 


m 


E Ciÿi+ [{m - 1) 
i=l 


1/2 


m .2^1/2 


E- 

\, = i ni 


(8.112) 


Now, if the F-test rejects Hq at the a-level, then there exists a contrast 

A 

TJJLiC'j yi = which is significantly different from zéro, that is, the inter- 
val (8.112) for Cl = cf (i= 1, 2, ... , m) does not contain the value zéro. Here, 


a*' = 1*X, where 1* = G 2 Lp is an eigenvector of G 2 corresponding to 
^mamCG^^Gi). We have that 


G/ = [l(X'X) L'] ^ = 


1 

r 

1 1 \ 

J = 1 

1 

+ 

> 


\ni j 


-1 


where Jfn-i is a matrix of ones of order (m — 1) X (m — 1), and A = 
Diag(7î7^ ^ 3 ^, • • • , ^PPlyirig lhe Sherman-Morrison-Woodbury for- 

mula (see Exercise 2.15), we obtain 




-1 




= A-i- 


^1 + Iffi-l A ^Im-l 


1 

Diag(n2, «2,. ..,«„) 

n 


n 


2 


n 


n 


m 


[«2) 223, . . . , 


Also, 


7 = Lp = L(X'X)“X'y = 


yi.-yi. 

ÿi.-h. 

» 

-1 


It can be verified that the ith element of 1 * = G 2 is given by 

n ^ 

=ni+i(h.-ÿi+i.)- — 'Lnj{ÿi-ÿj_), i = l,2, 


» » » 


,m-l. (8.113) 



390 


OPTIMIZATION IN STATISTICS 


The estimated standard error, of the ith element of 7 (/ = 1, 2, . . . , m — 1) 
is the square root of the ith diagonal element of L(X'X)“L' = 
[(l/ 7 îi)J^_i + A]MSe, that is, 

1/2 

, / = 1, 2, . . . , m — 1. 


A 

Ki = 


Il 1 ^ 

h 


\«1 


«, + l 


MS, 


Thus by formula (8.110), large values of identify those éléments of 

A 

7 = Lp that are influential contributors to the significance of the F- test. In 
particular, if the data set used to analyze model ( 8 . 111 ) is balanced, that is, 
rii = n/m for i = 1 , 2 , . . . , m, then 


1 

{ 2n \ 

3^ 

II 

+ 

’ 1 

\—MS^ 

\ ^ 1 


1/2 


/ = 1, 2, . . . , m — 1, 


where y = (l/m)E^iy,. . 


Alternatively, the contrast can be expressed as 
a*'p = l*'L(X'X)"X'y 


/ m — 1 

0, E 

\ i=i 


-I 


* 

2 ’ • • • ’ 






m 


= Lcfÿi., 

i = l 

where If is given in formula (8.113) and 


cf = < 


^ m — 1 

L iJ. ^‘ = 1 , 

7 = 1 

/ = 2 , 3, . . . , m. 


(8.114) 


Since the estimated standard error of is , / = 1,2, ...,m, by 

dividing by this value we obtain 


m 


a*'P= i: 

i = l 


( 1 

— 


1/2 




where w, =ÿ^ is a scaled value of (/ = 1,2, . . . , m). Hence, 
large values of identify those ’s that contribute signifi- 

cantly to the rejection of In particular, for a balanced data set, 

= -{ÿ.-ÿi+i)^ i = l, 2 , 
m 


» » » 


, m — 1. 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


391 


Thus from formula (8.114) we get 


MS^ 


/ 


1/2 


Cf 



i= 1, 2, . . . , m. 


We conclude that large values of \ÿi—ÿ_\ are responsible for the rejection of 
Hq by the F-test. This is consistent with the fact that the numerator sum of 
squares of the F-test statistic for Hq is proportional to when 

the data set is balanced. 


FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

Adby, P. R., and M. A. H. Dempster (1974). Introduction of Optimization Methods. 
Chapman and Hall, London. (This book is an introduction to nonlinear methods 
of optimization. It covers basic optimization techniques such as steepest descent 
and the Newton-Raphson method.) 

Ash, A., and A. Hedayat (1978). “An introduction to design optimality with an 
OverView of the literature.” Comm. Statist. Theory Methods, 7, 1295-1325. 

Atkinson, A. C. (1982). “Developments in the design of experiments.” /n/emaC Statist. 
Rev., 50, 161-177. 

Atkinson, A. C. (1988). “Recent developments in the methods of optimum and 
related experimental designs.” /ntcmaC Statist. Rev., 56, 99-115. 

Bâtes, D. M., and D. G. Watts (1988). Nonlinear Régression Analysis and its Applica- 
tions. Wiley, New York. (Estimation of parameters in a nonlinear model is 
addressed in Chaps. 2 and 3. Design aspects for nonlinear models are briefly 
discussed in Section 3.14.) 

Bayne, C. K., and I. B. Rubin (1986). Practical Experimental Designs and Optimization 
Methods for Chemists. VCH Publishers, Deerfield Beach, Florida. (Steepest 
ascent and the simplex method are discussed in Chap. 5. A bibliography of 
optimization and response surface methods, as actually applied in 17 major fields 
of chemistry, is provided in Chap. 7.) 

Belsley, D. A., E. Kuh, and R. E. Welsch (1980). Régression Diagnostics. Wiley, New 
York. (Chap. 3 is devoted to the diagnosis of multicollinearity among the columns 
of the matrix in a régression model. Multicollinearity renders the modePs 
least-squares parameter estimâtes less précisé and less useful than would other- 
wise be the case.) 

Biles, W. E., and J. J. Swain (1980). Optimization and Industrial Expérimentation. 
Wiley-Interscience, New York. (Chaps. 4 and 5 discuss optimization techniques 
that are directly applicable in response surface methodology.) 

Bohachevsky, I. O., M. E. Johnson, and M. L. Stein (1986). “Generalized simulated 
annealing for function optimization.” Technometrics, 28, 209-217. 

Box, G. E. P. (1982). “Choice of response surface design and alphabetic optimality.” 
Utilitas Math., 21B, 11-55. 

Box, G. E. P., and D. W. Behnken (1960). “Some new three level designs for the study 
of quantitative variables.” Technometrics, 2, 455-475. 



392 


OPTIMIZATION IN STATISTICS 


Box, G. E. P., and N. R. Draper (1959). “A basis for the sélection of a response 
surface design.”/. Amer. Statist. Assoc., 55, 622-654. 

Box, G. E. P., and N. R. Draper (1963). “The choice of a second order rotatable 
Biometrika, 50, 335-352. 

Box, G. E. P., and N. R. Draper (1965). “The Bayesian estimation of common 
parameters from several responses.” Biometrika, 52, 355-365. 

Box, G. E. P., and N. R. Draper (1987). Empirical Model-Buüding and Response 
Surfaces. Wiley, New York. (Chap. 9 introduces the exploration of maxima with 
second-order models; the alphabetic optimality approach is critically considered 
in Chap. 14. Many examples are given throughout the book.) 

Box, G. E. P., and H. L. Lucas (1959). “Design of experiments in nonlinear situations.” 
Biometrika, 46, 77-90. 

Box, G. E. P., and K. B. Wilson (1951). “On the experimental attainment of optimum 
conditions.”/. Roy. Statist. Soc. Ser. B, 13, 1-45. 

Bunday, B. D. (1984). Basic Optimization Methods. Edward Arnold Ltd., Victoria, 
Australia. (Chaps. 3 and 4 discuss basic optimization techniques such as the 
Nelder-Mead simplex method and the Davidon-Fletcher-Powell method.) 

Conlon, M. (1991). “The controlled random search procedure for function optimiza- 
tion.” Personal communication. (This is a FORTRAN file for implementing 
Price’s controlled random search procedure.) 

Conlon, M., and A. I. Khuri (1992). “Multiple response optimization.” Technical 
Report, Department of Statistics, University of Florida, Gainesville, Florida. 

Cook, R. D., and C. J. Nachtsheim (1980). “A comparison of algorithms for construct- 
ing exact D-optimal designs.” Technometrics, 22, 315-324. 

Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). “Maximum likelihood from 
incomplète data via the EM algorithm.” /. Roy. Statist. Soc. Ser. B, 39, 1-38. 

Draper, N. R. (1963). “Ridge analysis of response surfaces.” Technometrics, 5, 469-479. 

Everitt, B. S. (1987). Introduction to Optimization Methods and Their Application in 
Statistics. Chapman and Hall, London. (This book gives a brief introduction to 
optimization methods and their use in several areas of statistics. These include 
maximum likelihood estimation, nonlinear régression estimation, and applied 
multivariate analysis.) 

Fedorov, V. V. (1972). Theory of Optimal Experiments. Academie Press, New York. 
(This book is a translation of a monograph in Russian. It présents the mathemati- 
cal apparatus of experimental design for a régression model.) 

Fichtali, J., F. R. Van De Voort, and A. I. Khuri (1990). “Multiresponse optimization 
of acid casein production.” /. Food Process Eng., 12, 247-258. 

Fletcher, R. (1987). Practical Methods of Optimization, 2nd ed. Wiley, New York. (This 
book gives a detailed study of several unconstrained and constrained optimiza- 
tion techniques.) 

Fletcher, R., and M. J. D. Powell (1963). “A rapidly convergent descent method for 
minimization.” Comput. /., 6, 163-168. 

Hartley, H. O., and J. N. K. Rao (1967). “Maximum likelihood estimation for the 
mixed analysis of variance mod&V’ Biometrika, 54, 93-108. 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


393 


Hoerl, A. E. (1959). “Optimum solution of many variables équations.” Chem. Eng. 
Prog., 55 , 69-78. 

Huber, P. J. (1973). “Robust régression: Asymptotics, conjectures and Monte Carlo.” 
Ann. Statist., 1 , 799-821. 

Huber, P. J. (1981). Robust Statistics. Wiley, New York. (This book gives a solid 
foundation in robustness in statistics. Chap. 3 introduces and discusses M-estima- 
tion; Chap. 7 addresses M-estimation for a régression model.) 

Johnson, M. E., and C. J. Nachtsheim (1983). “Some guidelines for constructing exact 
O-optimal designs on convex design spaces.” Technometrics, 25 , 271-277. 

Jones, E. R., and T. J. Mitchell (1978). “Design criteria for detecting model inade- 
quacy. Biometrika, 65 , 541-551. 

Karson, M. J., A. R. Manson, and R. J. Hader (1969). “Minimum bias estimation and 
experimental design for response surfaces.” Technometrics, 11 , 461-475. 

Khuri, A. L, and M. Conlon (1981). “Simultaneous optimization of multiple responses 
represented by polynomial régression fonctions.” Technometric, 23 , 363-375. 

Khuri, A. I., and J. A. Cornell (1996). Response Surfaces, 2nd ed. Marcel Dekker, New 
York. (Optimization techniques in response surface methodology are discussed in 
Chap. 5.) 

Khuri, A. L, and R. H. Myers (1979). “Modified ridge analysis.” Technometrics, 21 , 
467-473. 

Khuri, A. I., and H. Sahai (1985). “Variance components analysis: A sélective 
literature Internat. Statist. Rev., 53 , 279-300. 

Kiefer, J. (1958). “On the nonrandomized optimality and the randomized nonoptimal- 
ity of symmetrical designs.” Ann. Math. Statist., 29, 675-699. 

Kiefer, J. (1959). “Optimum experimental designs” (with discussion). J. Roy. Statist. 
Soc. Ser. B, 21 , 272-319. 

Kiefer, J. (1960). “Optimum experimental designs V, with applications to systematic 
and rotatable designs.” In Proceedings of the Fourth Berkeley Symposium on 
Mathematical Statistics and Probability, Vol. 1. University of California Press, 
Berkeley, pp. 381-405. 

Kiefer, J. (1961). “Optimum designs in régression problems IL” Ann. Math. Statist., 
32 , 298-325. 

Kiefer, J. (1962a). “Two more criteria équivalent to D-optimality of designs.” Ann. 
Math. Statist., 33, 792-796. 

Kiefer, J. (1962b). “An extremum resuit.” Canad. J. Math., 14 , 597-601. 

Kiefer, J. (1975). “Optimal design: Variation in structure and performance under 
change of criterion.” Biometrika, 62 , 277-288. 

Kiefer, J., and J. Wolfowitz (1960). “The équivalence of two extremum problems.” 
Canad. J. Math., 12 , 363-366. 

Kirkpatrick, S., C. D. Gelatt, and M. P. Vechhi (1983). “Optimization by simulated 
annealing.” Science, 220, 671-680. 

Little, R. J. A., and D. B. Rubin (1987). Statistical Analysis with Missing Data. Wiley, 
New York. (The theory of the EM algorithm is introduced in Chap. 7. The book 
présents a systematic approach to the analysis of data with missing values, where 
inferences are based on likelihoods derived from formai statistical models for the 
data.) 



394 


OPTIMIZATION IN STATISTICS 


Lucas, J. M. (1976). “Which response surface design is best.” Technometrics, 18 , 
411-417. 

Miller, R. G., Jr. (1981). Simultaneous Statistical Inference, 2nd ed. Springer-Verlag, 
New York. (Scheffé’s simultaneous confidence intervals are derived in Chap. 2.) 

Milliken, G. A., and D. E. Johnson (1984). Analysis ofMessy Data. Lifetime Learning 
Publications, Belmont, California. (This book présents several techniques and 
methods for analyzing unbalanced data.) 

Mitchell, T. J. (1974). “An algorithm for the construction of D-optimal experimental 
designs.” Technometrics, 16 , 203-210. 

Myers, R. H. (1976). Response Surface Methodology. Author, Blacksburg, Virginia. 
(Chap. 5 discusses the détermination of optimum operating conditions in re- 
sponse surface methodology; designs for fitting first-order and second-order 
models are discussed in Chaps. 6 and 7, respectively; Chap. 9 présents the 
/-criterion for choosing a response surface design.) 

Myers, R. H. (1990). Classical and Modem Régression with Applications, 2nd ed., 
PWS-Kent, Boston. (Chap. 3 discusses the effects and hazards of multicollinear- 
ity in a régression model. Methods for detecting and combating multicollinearity 
are given in Chap. 8.) 

Myers, R. H., and W. H. Carter, Jr. (1973). “Response surface techniques for dual 
response Systems.” Technometrics, 15 , 301-317. 

Myers, R. H., and A. I. Khuri (1979). “A new procedure for steepest ascent.” Comm. 
Statist. Theory Methods, 8, 1359-1376. 

Myers, R. H., A. I. Khuri, and W. H. Carter, Jr. (1989). “Response surface methodol- 
ogy: 1966-1988.” Technometrics, 31 , 137-157. 

Nelder, J. A., and R. Mead (1965). “A simplex method for function minimization.” 
Comput. /., 7, 308-313. 

Nelson, L. S. (1973). “A sequential simplex procedure for non-linear least-squares 
estimation and other function minimization problems.” In 27th Annual Technical 
Conférence Transaction, American Society for Quality Control, pp. 107-117. 

Olsson, D. M., and L. S. Nelson (1975). “The Nelder-Mead simplex procedure for 
function minimization.” Technometrics, 17 , 45-51. 

Pazman, A. (1986). Foundations of Optimum Experimental Design. D. Reidel, 
Dordrecht, Holland. 

Plackett, R. L., and J. P. Burman (1946). “The design of optimum multifactorial 
experiments.” Biometrika, 33 , 305-325. 

Price, W. L. (1977). “A controlled random search procedure for global optimization.” 
Comput. J., 20 , 367-370. 

Rao, C. R. (1970). “Estimation of heteroscedastic variances in linear models.” 
J. Amer. Statist. Assoc., 65 , 161-172. 

Rao, C. R. (1971). “Estimation of variance and covariance components — MINQUE 
theory.”/. MultivariateAnal., 1 , 257-275. 

Rao, C. R. (1972). “Estimation of variance and covariance components in linear 
models.”/. Amer. Statist. Assoc., 67, 112-115. 

Roussas, G. G. (1973). A First Course in Mathematical Statistics. Addison-Wesley, 
Reading, Massachusetts. 



EXERCISES 


395 


Rustagi, J. S., ed. (1979). Optimizing Methods in Statistics. Academie Press, New York. 

Scheffé, H. (1959). The Analysis of Variance. Wiley, New York. (This classic book 
présents the basic theory of analysis of variance, mainly in the balanced case.) 

Searle, S. R. (1971). Linear Models. Wiley, New York. (This book describes general 
procedures of estimation and hypothesis testing for linear models. Estimable 
linear fonctions for models that are not of full rank are discussed in Chap. 5.) 

Seber, G. A. F. (1984). Multivariate Observations, Wiley, New York. (This book gives a 
comprehensive survey of the subject of multivariate analysis and provides many 
useful references.) 

Silvey, S. D. (1980). Optimal Designs. Chapman and Hall, London. 

Spendley, W., G. R. Hext, and F. R. Himsworth (1962). “Sequential application of 
simplex designs in optimization and evolutionary operation.” Technometrics, 4, 
441-461. 

St. John, R. C., and N. R. Draper (1975). “Z)-Optimality for régression designs: 
A review.” Technometrics, 17, 15-23. 

Swallow, W. H., and S. R. Searle (1978). “Minimum variance quadratic unbiased 
estimation (MIVQUE) of variance components.” Technometrics, 20, 265-272. 

Watson, G. S. (1964). “A note on maximum \ike\ihood.’’ Sankhyâ Ser. A, 26, 303-304. 

Wynn, H. P. (1970). “The sequential génération of Z)-optimum experimental designs.” 
Ann. Math. Statist., 41, 1655-1664. 

Wynn, H. P. (1972). “Results in the theory and construction of D-optimum experi- 
mental designs.”/. Roy. Statist. Soc. Ser. B, 34, 133-147. 

Zanakis, S. H., and J. S. Rustagi, eds. (1982). Optimization in Statistics. North-Holland, 
Amsterdam, Holland. (This is Volume 19 in Studies in the Management Sciences. 
It contains 21 articles that address applications of optimization in three areas of 
statistics, namely, régression and corrélation; multivariate data analysis and 
design of experiments; and statistical estimation, reliability, and quality control.) 


EXERCISES 

8 . 1 . Consider the function 


/(xi, Xz) = Sxj — 4x^X2 + 5x|. 

Minimize /(x^ X 2 ) using the method of steepest descent with Xq = (5, 2)' 
as an initial point. 

8.2. Conduct a simulated steepest ascent exercise as follows: Use the 
function 


7](xi, X 2 ) = 47.9 + 3xi — X 2 + 4xi + 4x^X2 + 3x| 

as the true mean response, which dépends on two input variables x^ 
and X 2 . Generate response values by using the model 

3 ;(x) = i 7 (x) + e, 



396 


OPTIMIZATION IN STATISTICS 


where e has the normal distribution with mean 0 and variance 2.25, 
and x = (xi, X 2 )' ‘ Fit a first-order model in x^ and X 2 in a neighbor- 
hood of the origin using a 2^ factorial design along with the corre- 
sponding simulated response values. Make sure that réplications are 
taken at the origin in order to test for lack of fit of the fitted model. 
Détermine the path of steepest ascent, then proceed along it using 
simulated response values. Conduct additional experiments as de- 
scribed in Section 8.3.1. 

8.3. Two types of fertilizers were applied to experimental plots to assess 
their effects on the yield of a certain varie ty of potato. The design 
settings used in the experiment along with the corresponding yield 
values are given in the following table: 


Original Settings 

Coded Settings 

Yield y 

Fertilizer 1 

Fertilizer 2 

Xi 

^2 

(Ib/plot) 

50.0 

15.0 

-1 

-1 

24.30 

120.0 

15.0 

1 

-1 

35.82 

50.0 

25.0 

-1 

1 

40.50 

120.0 

25.0 

1 

1 

50.94 

35.5 

20.0 

_ 2 l /2 

0 

30.60 

134.5 

20.0 

21/2 

0 

42.90 

85.0 

12.9 

0 

_ 2 i /2 

22.50 

85.0 

27.1 

0 

2I/2 

50.40 

85.0 

20.0 

0 

0 

45.69 


(a) Fit a second-order model in the coded variables 

Fl - 85 F 2 - 20 

to the yield data, where F-^ and F 2 are the original settings of 
fertilizers 1 and 2, respectively, used in the experiment. 

(b) Apply the method of ridge analysis to détermine the settings of the 
two fertilizers that are needed to maximize the predicted yield (in 
the space of the coded input variables, the région R is the interior 
and boundary of a circle centered at the origin with a radius equal 
to 21 / 2 ). 


8.4. Suppose that and A 2 are two values of the Lagrange multiplier A 
used in the method of ridge analysis. Let and y 2 be the correspond- 
ing values of y on the two spheres x'x = ri and x'x = r|, respectively. 
Show that if = V 2 and A^ > À 2 , then yj >y^. 



EXERCISES 


397 


8.5. Consider again Exercise 8.4. Let and X 2 be the stationary points 
corresponding to and A 2 , respectively. Consider also the matrix 

M(x^) =2(B- A^l), / = 1,2. 

Show that if =^ 2 ^ M(x^) is positive definite, and M(x 2 ) is indefinite, 
then yi < 5 ^ 2 - 

8 . 6 . Consider once more the method of ridge analysis. Let x be a stationary 
point that corresponds to the radius r. 

(a) Show that 


r 


3 


d^r 


k 

= 2r^i: 


/ = 1 


dXi \ 

dX 


+ 



^ dXi 

2' 

/I 

\l = i / 

? 


where x, is the /th element of x (/ = 1, 2, . . . , A), 

(b) Make use of part (a) to show that 


d^r 

— ^ >0 if r ^ 0 . 

d)^ 


8.7. Suppose that the “true”mean response ? 7 (x) is represented by a model 
of order d 2 in k input variables X 2 , . . . , x^ of the form 

T?(x) =f'(x)P + g'(x) 8 , 

where x = (x^ X 2 , . . . , x^)'. The fitted model is of order d^) of the 
form 


j)(x) =f'(x)X, 


where \ is an estimator of P, not necessarily obtained by the method 
of least squares. Let 7 =£’(À). 

(a) Give an expression for B, the average squared bias of y(x), in terms 
of P, 8 , and 7 . 

(b) Show that B achieves its minimum value if and only if 7 is of the 

form 7 = Ct, where T = (p',80' and C = [I: The matri- 

ces and r ^2 ^re the région moments used in formula (8.54). 

(c) Deduce from part (b) that B achieves its minimum value if and 
only if Ct is an estimable linear function (see Searle, 1971, Section 
5.4). 



398 


OPTIMIZATION IN STATISTICS 


(d) Use part (c) to show that B achieves its minimum value if and only 
if there exists a matrix L such that C = L[X: Z], where X and Z are 
matrices consisting of the values taken by f'(x) and g'(x), respec- 
tively, at n experimental runs. 

(e) Deduce from part (d) that B achieves its minimum for any design 
for which the row space of [X: Z] contains the rows of C. 

A A 

(f) Show that if \ is the least-squares estimator of p, that is, \ = 
(X'X)“^X'y, where y is the vector of response values at the n 
experimental runs, then the design property stated in part (e) holds 
for any design that satisfies the conditions described in équations 
(8.56). 

[Note: This problem is based on an article by Karson, Manson, and 
Hader (1969), who introduced the so-called minimum bias estima- 
tion to minimize the average squared bias B.] 

8.8. Consider again Exercise 8.7. Suppose that f'(x)p = is a 

first-order model in three input variables fitted to a data set obtained 
by using the design 


8 -g 
8 -8 
8 8 ’ 

8 8 _ 

where g is a scale factor. The région of interest is a sphere of radius 1. 
Suppose that the “true” model is of the form 

3 

?](x) = /3 q + ^ + Pl 2 ^ 1^2 Pl3^1^3 P23^2^3‘ 

i=l 

(a) Can g be chosen so that D satisfies the conditions described in 
équations (8.56)? 

(b) Can g be chosen so that D satisfies the minimum bias property 
described in part (e) of Exercise 8.7? 

8.9. Consider the function 



/z(Ô,D) =8 AÔ, 

where 8 is a vector of unknown parameters as in model (8.48), A is the 
matrix in formula (8.53), namely A = AT^^A — EJ 2 A — AT ^2 + ^ 22 , and 
D is the design matrix. 



EXERCISES 


399 


(a) Show that for a given D, the maximum of /z(8,D) over the région 

{(/= {8| ô'ô < r^} is equal to where is the largest 

eigenvalue of A. 

(b) Deduce from part (a) a design criterion for choosing D. 


8 . 10 . Consider fitting the model 

3 ^(x)=r(x)p + €, 

where 6 is a random error with a zéro mean and a variance 
Suppose that the “true”mean response is given by 

7j(x) =f'(x)P + g'(x)8. 

Let X and Z be the same matrices defined in part (d) of Exercise 8.7. 
Consider the function A(ô,D) = 8'Sô, where 


S = Z' 


I-X(X^X) ^X' 



and D is the design matrix. The quantity A(8,D)/o-^ is the noncentral- 
ity parameter associated with the lack of fit F-test for the fitted model 
(see Khuri and Cornell, 1996, Section 2.6). Large values of X/a^ 
increase the power of the lack of fit test. By formula (8.54), the 
minimum value of B is given by 

Bmm = — 8'T8, 
a 

where ^ = ^ 22 ~^i 2 ^ïi^^i 2 ‘ The fitted model is considered to be 
inadéquate if there exists some constant /c > 0 such that 8'T8>/c. 
Show that 


inf 8'S8 = /ce„.„(T-iS), 

where is the smallest eigenvalue of T“^S and ^ is the 

région {8|8'T8 > k}. 

[Note: On the basis of this problem, we can define a new design 
criterion, that which maximizes ^min(T“^S) with respect to D. A design 
chosen according to this criterion is called A^-optimal (see Jones and 
Mitchell, 1978).] 

8 . 11 . A second-order model of the form 


J'Cx) = /3o + /3iXi + 132X2 + (3nxj + fexf + ^ 12 ^ 1 X 2 + e 



400 


OPTIMIZATION IN STATISTICS 


is fitted using a rotatable central composite design D, which consists of 
a factorial 2^ portion, an axial portion with an axial parameter a = 2^^^, 
and tîq center-point réplications. The settings of the 2^ factorial 
portion are ±1. The région of interest R consists of the interior and 
boundary of a circle of radius 2^/^ centered at the origin. 

(a) Express V, the average variance of the predicted response given by 
formula (8.52), as a function of Hq. 

(b) Can tîq be chosen so that it minimizes K? 

8.12. Suppose that we hâve r response fonctions represented by the models 

y^. = Xp. + e,., /= 1,2, ...,r, 

where X is a known matrix of order n Xp and rank p. The random 
error vectors hâve the same variance -covariance structure as in Section 
8.7. Let F =E(Y) = XB, where Y = [yi:y 2 : *” :y^] and B = [pp p 2 * *** -Prl- 
Show that the déterminant of (Y — F)'(Y — F) attains a minimum 

A A 

value when B = B, where B is obtained by replacing each P- in B with 
p,- = (X'X)-'X'y, (/=l,2,...,r). 

[Note: The minimization of the déterminant of (Y — F)'(Y — F) with 
respect to B represents a general multiresponse estimation criterion 
known as the Box-Draper déterminant criterion (see Box and Draper, 
1965).] 

8.13. Let A be a p Xp matrix with nonnegative eigenvalues. Show that 

det(A) < exp[tr(A - I^)] . 

[Note: This inequality is proved in an article by Watson (1964). It is 
based on the simple inequality a < exp(a — 1), which can be easily 
proved for any real number a.] 

8.14. Let x^,X 2 ,...,x„ be a sample of n independently distributed random 
vectors from a p-variate normal distribution A^(|x,V). The correspond- 
ing likelihood function is 

X (X;- M-) • 

^ i = l 

It is known that the maximum likelihood estimate of |x is x, where 
X = (l//r)E”=iX- (see, for example, Seber, 1984, pages 59-61). Let S be 
the matrix 

S = - X (X;-x)(x, -x)'. 

^ _ 1 


1 



EXERCISES 


401 


Show that S is the maximum likelihood estimate of V by proving that 



or equivalently, 


[det(SV ^)]”'^^exp - 


n 

2 





[Hint: Use the inequality given in Exercise 8.13.] 

8 . 15 . Consider the random one-way classification model 

yij = /X + «■ + €-j, Z = 1, 2, . . . , a; 7 = 1,2, . . . , 7î., 

where the o^/s and e-^s are independently distributed as A^(0, and 
A^(0, a/). Détermine the matrix S and the vector q in équation (8.95) 
that can be used to obtain the MINQUEs of and 

8 . 16 . Consider the linear model 


y = xp + e, 

where X is a known matrix of order n Xp and rank p, and e is 
normally distributed with a zéro mean vector and a variance -covari- 
ance matrix Let y(x) dénoté the predicted response at a point x 

in a région of interest R. 

Use Scheffé’s confidence intervals given by formula (8.97) to obtain 
simultaneous confidence intervals on the mean response values at the 
points x^,X 2 , . . . {m <p) in R. What is the joint confidence coeffi- 
cient for these intervals? 

8.17. Consider the fixed-effects two-way classification model 




402 


OPTIMIZATION IN STATISTICS 


where and are unknown parameters, is the interaction 

effect, and is a random error that has the normal distribution with 
a zéro mean and a variance 

(a) Use Scheffé’s confidence intervals to obtain simultaneous confi- 
dence intervals on ail contrasts among the /x/s, where /jii=E(ÿi ) 
and 3;,. 

(b) Identify those 3^^ ’s that are influential contributors to the sig- 

nificance of the F- test concerning the hypothesis fji^ = 1^2 = 
■** = 



CHAPTER 9 


Approximation of Functions 


The class of polynomials is undoubtedly the simplest class of functions. In 
this chapter we shall discuss how to use polynomials to approximate continu- 
ons functions. Piecewise polynomial functions (splines) will also be discussed. 
Attention will be primarily confined to real-valued functions of a single 
variable x. 


9.1. WEIERSTRASS APPROXIMATION 

We may recall from Section 4.3 that if a function f{x) has dérivatives of ail 
orders in some neighborhood of the origin, then it can be represented by a 
power sériés of the form P t^e radius of convergence of this 

sériés, then the sériés converges uniformly for |x| <r, where r<p (see 
Theorem 5.4.4). It follows that for a given 6 > 0 we can take sufficiently many 
terms of this power sériés and obtain a polynomial p„(x) = 
degree n for which \f(x) — pj^x)\ < e for |x| <r. But a function that is not 
différentiable of ail orders does not hâve a power sériés représentation. 
However, if the function is continuons on the closed interval [a, b], then it 
can be approximated uniformly by a polynomial. This is guaranteed by the 
following theorem: 

Theorem 9.1.1 (Weierstrass Approximation Theorem). Let /: [a,b]^R 
be a continuons function. Then, for any e > 0, there exists a polynomial p(x) 
such that 


|/(x) -p(x)\ < € for ail X ^ [ü, b^. 

Proof Without loss of generality we can consider [a, b] to be the interval 
[0, 1]. This can always be achieved by making a change of variable of the form 


X — a 
b — a 


403 



404 


APPROXIMATION OF FUNCTIONS 


As X varies from a io b, t varies from 0 to 1. Thus, if necessary, we consider 
that such a linear transformation has been made and that t has been 
renamed as x. 

For each n, let bj^x) be defined as a polynomial of degree n of the form 


Kix) 



k = 0 





k \ 



(9.1) 


where 


We hâve that 



n\ 

k\(n —k)\ * 



;c=o 



k = 0 


k 

n 



k = 0 






x'^(l-x)”“'^ = l. 

(9.2) 

X^(l — x)” ^ =x. 

(9.3) 

, _, / 1 \ X 

x^(l— x)” = 1 X^H . 

\ n] n 

(9.4) 


These identities can be shown as follows: Let be a binomial random 
variable B(n,x). Thus represents the number of successes in a sequence 
of n independent Bernoulli trials with x the probability of success on a single 
trial. Hence, E(Y^)=wc and Var(l^) = m:(l — x) (see, for example, Harris, 
1966, page 104; see also Exercise 5.30). It follows that 


n / \ 

k = 0 ^ 


n —k 


x^l-x) =P(0<Y„<n) = l. 


(9.5) 


Furthermore, 


n 


Lki 

k = 0 


n 

k 


n—k 


x^{l—x) =E{Y^)=nx, 


(9.6) 


n 


Lk 

k = 0 


2 i n 
k 


x^=(l -xY~‘‘ =£(y/) = Var(y„) + \E{Y„)] 


2 


= nx{l — x) +/î^x^ =n^ 



f 

n ^ 


1-- 

x^+ - 


l n ) 

n 


(9.7) 


Identities (9.2)-(9.4) follow directly from (9.5)-(9.7). 



WEIERSTRASS APPROXIMATION 


405 


Let us now consider the différence f(x) — bj^x), which with the help of 
identity (9.2) can be written as 

f{x)-b„{x)=i /(x) -/[-] (9.8) 

yc=o L J \\^ / 

Since f(x) is continuons on [0, 1], then it must be bounded and uniformly 
continuons there (see Theorems 3.4.5 and 3.4.6). Hence, for the given c> 0, 
there exist numbers 8 and m such that 

\f{Xi) -f{X2)\<- if|Xi-X2|<5 

and 

|/(x) \ <m for ail x e [0, 1]. 

From formula (9.8) we then hâve 

\f(x) -b„{x)\< E f{x)-f{-] 

k = 0 I ) 

If \x — k/n\ <8, then \f(x) —f(k/n)\ <€/2; otherwise, we hâve |/(x) — 
f(k/n)\ < 2m for 0 <x< 1. Consequently, by using identifies (9.2)-(9.4) we 
obtain 




406 


APPROXIMATION OF FUNCTIONS 


Hence, 


€ 2m x( 1 —X 

\f{x)-b„{x)\<- + -^ — 

Z O n 


€ m 

— ^ T ? 

2 2n8^ 


(9.9) 


since xil—x)<\ for 0 <x < 1. By choosing n large enough that 


m € 
2nô^ 2 


we conclude that 


\f{x)-b^{x)\<€ 

for ail X e [0, 1]. The proof of the theorem follows by taking p(x) = bj^x). 

□ 

Définition 9.1.1, Let /(x) be defined on [0, 1]. The polynomial bj^x) 
defined by formula (9.1) is called the Bernstein polynomial of degree n for 
fix). □ 

By the proof of Theorem 9.1.1 we conclude that the sequence {bj^x))2=i 
of Bernstein polynomials converges uniformly to f(x) on [0, 1]. These polyno- 
mials are useful in that they not only prove the existence of an approximating 
polynomial for f(x), but also provide a simple explicit représentation for it. 
Another advantage of Bernstein polynomials is that if f(x) is continuously 
différentiable on [0, 1], then the dérivative of bj^x) converges also uniformly 
to f'(x). A more general statement is given by the next theorem, whose proof 
can be found in Davis (1975, Theorem 6.3.2, page 113). 

Theorem 9.1.2. Let f(x) be p times différentiable on [0, 1]. If the pth 
dérivative is continuons there, then 

dPb„{x) dPf(x) 

lim = 

n^oo dx^ dx^ 


uniformly on [0, 1]. 

Obviously, the knowledge that the sequence {^„(x)}^^i converges uni- 
formly to f{x) on [0, 1] is not complété without knowing something about the 
rate of convergence. For this purpose we need to define the so-called 
modulas of continuity of /(x) on [a, b]. 



WEIERSTRASS APPROXIMATION 


407 


Définition 9.1.2. If f(x) is continuous on [a, b], then, for any ô>0, the 
modulus of continuity of f(x) on [a, b] is 

Cü(8)= sup |/(Xi) -/(X 2 )|, 

Ix^— X 2 I <8 


where and X 2 are points in [a, b]. □ 

On the basis of Définition 9.1.2 we hâve the following properties concern- 
ing the modulus of continuity: 

Lemma 9.1.1. If 0 < < Ô 2 , then < w(Ô 2 ). 

Lemma 9.1.2. For a function f(x) to be uniformly continuous on [a, b] it 
is necessary and sufficient that lim^^Q <w(ô) = 0. 

The proofs of Lemmas 9.1.1 and 9.1.2 are left to the reader. 

Lemma 9.1.3. For any A > 0, w(Aô) < (A + l)w(ô). 

Proof Suppose that A > 0 is given. We can find an integer n such that 
7Î < A < /î + 1. By Lemma 9.1.1, ca(Aô) < ù}[{n + 1)6]. Let x^ and %2 be two 
points in [a, b] such that <X 2 and Ix^ — X 2 I <(n 1)6. Let us also divide 

the interval [x^, X 2 ] into n 1 equal parts, each of length (x 2 —Xi)/(n + 1), 
by means of the partition points 


(X2 -Xi) 

y, =Xi + i , / = 0, 1, . . . , n + 1. 

n + 1 


Then 


f(Xi) -/(^ 2 )l = 1 /(^ 2 ) -f(Xi) 


n 


L [/(y,+i) -/(y,)] 

i = 0 


n 


^ E l/(y;+i) -/(y,) 

/ = 0 

< (tî + 1) w( ô). 


since + i — y^j = [l/(n + l)]|x 2 — xJ < ô for i = 0, 1, . . . , n. It follows that 


«[(n+l)S]= sup |/(xi) -/(X 2 )| < (n + 1) w(S). 

Ixj— X 2 I <(n + l)5 



408 


APPROXIMATION OF FUNCTIONS 


Consequently, 

w(AÔ)<w[(tî + 1)5] <(/î + 1)w(ô) 

Theorem 9.1.3. Let f(x) be continuous on [0, 1], and let bj^x) be the 
Bernstein polynomial defined by formula (9.1). Then 


\fix)-b„{x)\<-ùj 


I 1 ^ 


\ / 


for ail X e [0, 1], where <w(ô) is the modulus of continuity of f(x) on [0, 1]. 
Proof Using formula (9.1) and identity (9.2), we hâve that 


\f{x) -b„{x)\ = 


n 


k = Q 


n 


yt = 0 


/(*)-/! ^ 


/(^) -/I ^ 


n 

k 


x\l -x) 


n — k 


n 

k 


x\l-x) 


n —k 


n 


< Yi O) 

k = 0 


k 

X 

n 


n 

k 


x\l-x) 


n—k 


Now, by applying Lemma 9.1.3 we can write 


0 } 


/ 

k 

\ 

1 . . 

k 

\ 


X 


= ù) \ 

X 


\ 

n 

) 

\ 

n 

/ 


< 


l+/r'/2 


k 

X 

n 


ù){n ^/^). 


Thus 


n 


\f{x)-b„{x)\< E 

k = 0 


1 + 


k 

X 

n 


O) 






n—k 


< ù){n ) 


n 


1 + E 

^ = 0 


k 

X 

n 


n 

k 


x\\-x) 


n —k 



WEIERSTRASS APPROXIMATION 


409 


But, by the Cauchy-Schwarz inequality (see part 1 of Theorem 2.1.2), we 
hâve 


n 


E 

^ = 0 


k 

X 

n 


n 

k 


x^{l —x) 


n —k 


n 


= E 

/t = 0 


k 

X 

n 


n 

k 


x^{l —x) 


n — k 


1/2 


n 

k 


x^{l —x) 


n—k 


1/2 


< 


n 


k = 0 


n 


. ^ 
E X-- 

;t = 0 ' ” 


2 


n 

k 


n 

k 


x\l-x) 


n —k 


1/2. 


n / \ 

n 


^\k 

k=Q ^ 


ll/2 


x\l-x) 


n — k 


x\l-x) 


n —k 


1/2 


by identity (9.2) 


x^-2x^+\l-- 

n I 


x^+ - 
n 


X 


1/2 


by identities (9.3) and (9.4) 


x(l —x) 


n 


1/2 


< 


1 


4tî 

It follows that 


1/2 


since x(l — x) < |. 



/ 1 'i 

1 / 2 ' 

< ù)(n ) 

1 + 



l 4n J 



that is. 


for ail X e [0, 1]. 


□ 


We note that Theorem 9.1.3 can be used to prove Theorem 9.1.1 as 
follows: If /(x) is continuons on [0, 1], then /(x) is uniformly continuons on 
[0, 1]. Hence, by Lemma 9.1.2, ^ 0 as n ^ 

Corollary 9.1.1. If /(x) is a Lipschitz continuons function hip(K, a) on 
[0, 1], then 

|/(x)-fe„(x)|<|i&i-“/2 (9.10) 

for ail X e [0, 1]. 



410 


APPROXIMATION OF FUNCTIONS 


Proof By Définition 3.4.6, 

|/(Xi) -f{X2)\<K\x^-X2 

for ail Xi, %2 in [0, 1]. Thus 


ù){Ô) <KÔ^, 


By Theorem 9.1.3 we then hâve 

|/(x)-fe„(x)|<|i&z-“/2 
for ail X e [0, 1]. □ 

Theorem 9.1.4 (Voronovsky’s Theorem). If /(x) is bounded on [0, 1] and 
has a second-order dérivative at a point Xq in [0, 1], then 

lim n[b„{xo) -/(xq)] = ^Xq{1 -Xq)/" (xq) . 

Proof. See Davis (1975, Theorem 6.3.6, page 117). □ 

We note from Corollary 9.1.1 and Voronovsky’s theorem that the conver- 
gence of Bernstein polynomials can be very slow. For example, if /(x) 
satisfies the conditions of Voronovsky’s theorem, then at every point x e [0, 1] 
where /"(x) ^ 0, bj^x) converges to /(x) just like c/n, where c is a constant. 

Example 9.1.1. We recall from Section 3.4.2 that /(x) = Vx is Lip(l,|) 
for X > 0. Then, by Corollary 9.1.1, 

I ^ I 3 

|Vx -K{x)\<j^, 


for 0 <x < 1, where 


n 


K(x) = E 

k = 0 


n 

k 


x^(l-x) 


n—k 


n 


9.2. APPROXIMATION BY POLYNOMIAL INTERPOLATION 

One possible method to approximate a function /(x) with a polynomial p{x) 
is to select such a polynomial so that both /(x) and p{x) hâve the same 
values at a certain number of points in the domain of /(x). This procedure is 
called interpolation. The rationale behind it is that if /(x) agréés with p{x) 
at some known points, then the two functions should be close to one another 
at intermediate points. 

Let us first consider the following resuit given by the next theorem. 



APPROXIMATION BY POLYNOMIAL INTERPOLATION 


411 


Theorem 9.2.1. Let . . . , be n + 1 distinct points in R, the set of 

real numbers. Let be any given set of n + 1 real numbers. Then, 

there exists a unique polynomial p(x) of degree <n such that p(ai) = bi, 

t Oj 1, . . . 5 /î . 


Proof Since p(x) is a polynomial of degree < n, it can be represented as 
p{x) = ’LJ^qCjxK We must then hâve 

n 

X) Cja{ = bi, / = 0, 1, . . . , 7Î. 

/=o 

These équations can be written in vector form as 


"l 

^0 

al 

• • • 

al 


1 

O 

1 


1 

O 

1 

1 

• 

• 

a^ 

» 

• 

a\ 

» 

» 

• • • 

a” 

» 

• 


Cl 

» 

& 

= 

b, 

» 

A 

• 

1 

» 

«n 


• • • 

» 

a” 


1 

1 


1 

• 

1 


(9.11) 


The déterminant of the (n + 1) X (n + 1) matrix on the left side of équation 
(9.11) is known as Vandermonde’s déterminant and is equal to n"> j(a- — üj). 
The proof of this last assertion can be found in, for example, Graybill (1983, 
Theorem 8.12.2, page 266). Since the a/s are distinct, this déterminant is 
different from zéro. It follows that this matrix is nonsingular. Hence, équa- 
tion (9.11) provides a unique solution for Cq, c^, . . . , c„. □ 

Corollary 9.2.1. The polynomial p(x) in Theorem 9.2.1 can be repre- 
sented as 


n 

p{x)= (9.12) 

/ = 0 

where 


n 

li{x)=Y\ i = (9.13) 

7 = 0 

Proof. We hâve that Ifx) is a polynomial of degree n (/ = 0, 1, . . . , n). 
Furthermore, Ifa^) = 0 if / # j\ and Ifa^) =1 (/ = 0, 1, . . . , n). It follows that 
'L^^QbJfx) is a polynomial of degree <n and assumes the values 
bQ,b^,...,b^ at aQ,a^...,a„, respectively. This polynomial is unique by 
Theorem 9.2.1. □ 

Définition 9.2.1. The polynomial defined by formula (9.12) is called a 
Lagrange interpolating polynomial. The points a^,a^,...,a^ are called points 



412 


APPROXIMATION OF FUNCTIONS 


of interpolation (or nodes), and l^(x) in formula (9.13) is called the ith 
Lagrange polynomial associated with the a/s. □ 

The values in formula (9.12) are frequently the values of 

some function fix) at the points üq, a^. Thus f(x) and the polynomial 

p(x) in formula (9.12) attain the same values at these points. The polynomial 
pixX which can be written as 

p(x) = if{ai)liix), (9.14) 

i = 0 

provides therefore an approximation for f(x) over [üq, a^]. 

Example 9.2.1. Consider the function f(x) =x^^^. Let üq = 60, = 70, 

Ü 2 = 85, «3 = 105 be interpolation points. Then 

j:?(x) =7.7460/o(x) + 8.3666/i(x) +9.2195/2 (x) + 10.2470/3(x) , 

where 

(x-70)(x- 85)(x- 105) 

“ (60-70)(60-85)(60- 105) ’ 

(x-60)(x-85)(x- 105) 

“ (70 - 60) (70 - 85) (70 - 105) ’ 

(x-60)(x-70)(x- 105) 

“ (85 -60)(85 -70)(85 - 105) ’ 

(x-60)(x-70)(x-85) 

“ (105 - 60)(105 - 70)(105 - 85) ' 


Table 9.1. Approximation of fix) by the Lagrange Interpolating 

Polynomial pix) 


x 

fix) 

pix) 

60 

1.1AS91 

1.1A591 

64 

8.00000 

7.99978 

68 

8.24621 

8.24611 

70 

8.36660 

8.36660 

74 

8.60233 

8.60251 

78 

8.83176 

8.83201 

82 

9.05539 

9.05555 

85 

9.21954 

9.21954 

90 

9.48683 

9.48646 

94 

9.69536 

9.69472 

98 

9.89949 

9.89875 

102 

10.09950 

10.09899 

105 

10.24695 

10.24695 



APPROXIMATION BY POLYNOMIAL INTERPOLATION 


413 


Using p{x) as an approximation of f{x) over the interval [60, 105], tabulated 
values of f{x) and p{x) were obtained at several points inside this interval. 
The results are given in Table 9.1. 


9.2.1. The Accuracy of Lagrange Interpolation 

Let us now address the question of evaluating the accuracy of Lagrange 
interpolation. The answer to this question is given in the next theorem. 

Theorem 9.2.2. Suppose that f{x) has n continuous dérivatives on the 
interval [a, h\ and its {n + l)st dérivative exists on (a, h). Let a = a^<a^< 
••• <a^=b be n-\-l points in [a, b]. If p(x) is the Lagrange interpolating 
polynomial defined by formula (9.14), then there exists a point c e (a, b) such 
that for any x e [a, b\ x # (/ = 0, 1, . . . , n), 

f{x) -p{x) = . (9.15) 

(n + 1) ! 

where 

n 

gn + l(x) = Ylix-üi). 

i = 0 


Proof Define the fonction h{t) as 


h{t) =f{t) -p{t) - [f{x) -p{x)] 


g«+i(0 
gn+l{x) ' 


If t=x, then h{x) = 0. For t = a- (/ = 0, 1, . . . , n), 


h(ai) =/(«,) -p(ai) - [/(x) -p(x)\ 


gn + ijai) 
gn + i(x) 



The fonction h(t) has n continuous dérivatives on {a,b\ and its (n + l)st 
dérivative exists on {a, b). Furthermore, h(t) vanishes at x and at ail n 1 
interpolation points, that is, it has at least n -\-2 different zéros in [a, b]. By 
Rolle’s theorem (Theorem 4.2.1), h'(t) vanishes at least once between any 
two zéros of h(t) and thus has at least n 1 different zéros in (a, b). Also by 
Rolle’s theorem, h"(t) has at least n different zéros in (a, b). By continuing 
this argument, we see that has at least one zéro in (a, b), say at the 

point c. But, 


f(x) —p(x) 
f(x) —p(x) 


gn+l(x) 



414 


APPROXIMATION OF FUNCTIONS 


since p{t) is a polynomial of degree <n and g„ + i(0 is a polynomial of the 
form + *** + + ^ for suitable constants A^, A 2 , . . . , A„ + ^. 

We thus hâve 

f(x) —pix) 

/(« + i)(c) - \ n + 1) ! = 0, 

from which we can conclude formula (9.15). □ 


Corollary 9.2.2. Suppose that is continuous on [a, h]. Let 

T „+1 =sup,<^<fc|/("-"i)(x)l, /<„+! =sup,<^< Jg„+i(x)|. Then 


+ 1 + 1 


sup \f{x)-p{x)\< . , . 

a <x<b ( /r + 1) , 

Proof This follows directly from formula (9.15) and the fact that /(x) — 


p{x) = 0 for X = «Q, . . . , 


□ 


We note that being the supremum of lg„ + i(x)| = 

over [a, h\ is a function of the location of the a/s. From Corollary 9.2.2 we 
can then write 

sup \f{x)-p{x)\< sup (9.16) 

a<x<b a<x<b 

where (/>(^o^ , a^) = sup^< ^|g„ + ^(x)| . This inequality provides us with 

an upper bound on the error of approximating /(x) with p{x) over the 
interpolation région. We refer to this error as interpolation error. The upper 
bound clearly shows that the interpolation error dépends on the location of 
the interpolation points. 


Corollary 9.2.3. If, in Corollary 9.2.2, n = 2, and if a^— üq = a 2 ~ a^ = 8, 
then 


sup \f{x) -p{x)\< 

a <x< b 



Proof. Consider g 3 (x) = (x — «q)(x — af){x — « 2 )? which can be written as 
gs(x) =z{z'^ — 8^), where z =x — a-^. This function is symmetric with respect 
to x = a^. It is easy to see that Ig 3 (x)| attains an absolute maximum over 
«0 ^ ^ 2 ? or equivalently, —8<z<8,whcn z= ±8/}/3. Hence, 



sup 

ÜQ<x<a 2 


max 

— 8<z<8 





APPROXIMATION BY POLYNOMIAL INTERPOLATION 


415 


By applying Corollary 9.2.2 we obtain 


sup \f{x) -p{x)\< 

ciQ<x<a2 



We hâve previously noted that the interpolation error dépends on the choice 
of the interpolation points. This leads us to the following important question: 
How can the interpolation points be chosen so as to minimize the interpola- 
tion error? The answer to this question lies in inequality (9.16). One reason- 
able criterion for the choice of interpolation points is the minimization of 
(fyiüQ, . . . , with respect to «g, . . . , It turns out that the optimal 

locations of are given by the zéros of the Chebyshev polynomial 

(of the first kind) of degree n 1 (see Section 10.4.1). □ 


Définition 9.2.2. The Chebyshev polynomial of degree n is defined by 


T^{x) = cos(/r Arccos x) 

n 


= x^ + 


x"-2(x^-i)+ “ x^-\x^-iy + 


n 


rt — 4 / ,.2 


2 


7î = 0, 1, . . . .(9.17) 


Obviously, by the définition of T„(x), — 1 <x< 1. One of the properties of 
y{x) is that it has simple zéros at the following n points: 


Ci = cos 


2n 


TT 


1 / 1 ^ ^2 ^ . . . ^ /r . 


□ 


The proof of this property is given in Davis (1975, pages 61-62). 

We can consider Chebyshev polynomials defined on the interval [a, h] by 
making the transformation 


a + b b — a 



which transforms the interval — 1 < t < 1 into the interval a <x < b. In this 
case, the zéros of the Chebyshev polynomial of degree n over the interval 
[a, b] are given by 




a -\-b b — a 
^ cos 


7 2/- 1 ^ 


— 

TT 

\ 2n j 



I 1 , ^2 , . . . , /r . 


We refer to the z/s as Chebyshev points. These points can be obtained 
geometrically by subdividing the semicircle over the interval [a, b] into n 



416 


APPROXIMATION OF FUNCTIONS 


equal arcs and then projecting the midpoint of each arc onto the interval (see 
De Boor, 1978, page 26). 

Chebyshev points hâve a very interesting property that pertains to the 
minimization of in inequality (9.16). This property is de- 

scribed in Theorem 9.2.3, whose proof can be found in Davis (1975, Section 
3.3); see also De Boor (1978, page 30). 


Theorem 9.2.3. The function 


<^(« 0 , « 1 , •••,«„) = sup 

a <x<b 


n 

n (x — a^) 

/=o 


? 


where the a/sbelong to the interval [a, h\ achieves its minimum at the zéros 
of the Chebyshev polynomial of degree n + 1, that is, at 




a -\-h b — a 
^ cos 


2 /+ 1 
2/î H" 2 j 


TT 


/ = 0,l,...,/î, (9.18) 


and 


min (/)(aQ,a^,...,a„) 


2(b — a)”^^ 

4« + i 


From Theorem 9.2.3 and inequality (9.16) we conclude that the choice of 
the Chebyshev points given in formula (9.18) is optimal in the sense of 
reducing the interpolation error. In other words, among ail sets of interpo- 
lation points of size n 1 each, Chebyshev points produce a Lagrange 
polynomial approximation for f(x) over the interval [a, b] with a minimum 
upper bound on the error of approximation. Using inequality (9.16), we 
obtain the following interesting resuit: 


sup \f{x) -p{x)\< 

a <x<b 


2 

(tî + 1) ! 


b — a 
4 


rt + 1 

sup |/<” + i)(x)|. (9.19) 

a<x<b 


The use of Chebyshev points in the construction of Lagrange interpolating 
polynomial p{x) for the function f{x) over [a, b] produces an approximation 
which, for ail practical purposes, differs very little from the best possible 
approximation of f{x) by a polynomial of the same degree. This was shown 
by Powell (1967). More explicitly, let p*(x) be the best approximating 
polynomial of /(x) of the same degree as p{x) over [a, b]. Then, obviously, 

sup \f{x)-p*{x)\< sup \f{x)-p{x)\. 

a<x<b a<x<b 

De Boor (1978, page 31) pointed out that for n < 20, 

sup |/(x) -p(x)|<4 sup \f{x) -p*{x)\. 

a<x<b a<x<b 



APPROXIMATION BY POLYNOMIAL INTERPOLATION 


417 


This indicates that the error of interpolation which results from the use of 
Lagrange polynomials in combination with Chebyshev points does not exceed 
the minimum approximation error by more than a factor of 4 for n < 20. This 
is a very useful resuit, since the dérivation of the best approximating 
polynomial can be tedious and complicated, whereas a polynomial approxi- 
mation obtained by Lagrange interpolation that uses Chebyshev points as 
interpolation points is simple and straightforward. 


9.2.2. A Combination of Interpolation and Approximation 

In Section 9.1 we learned how to approximate a continuons function /: 
[a,h}^R with a polynomial by applying the Weierstrass theorem. In this 
section we hâve seen how to interpolate values of / on {a, h] by using 
Lagrange polynomials. We now show that these two processes can be 
combined. More specifically, suppose that we are given n + 1 distinct points 
in [a, h\ which we dénoté by . . . , with ûq = a and = b. Let e> 0 

be given. We need to find a polynomial q(x) such that \f(x) —q(x)\ < e for 
ail X in [a, b], and f(a-) = / = 0, 1, . . . , n. 

By Theorem 9.1.1 there exists a polynomial p{x) such that 

\f(x) —p(x)\< e' for ail X e [a, è], 

where e' < e/(l + M), and M is a nonnegative number to be described later. 
Furthermore, by Theorem 9.2.1 there exists a unique polynomial u(x) such 
that 


u{ai) =/(«,) -p(a^), z = 0,l,...,/r. 
This polynomial is given by 


u(x) = E [/(a,.) -p(a,.)]/,.(x), (9.20) 

/ = 0 

where /,(x) is the ith Lagrange polynomial defined in formula (9.13). Using 
formula (9.20) we obtain 


n 

max |m(x)|< —p{di)\ niax |/,(x) 

a<x<b a<x<b 


< €'M, 


where M= E"=o which is some finite nonnegative number. 

Note that M dépends only on [a, b] and üq, a^. Now, define ^(x) as 

^(x) = p(x) + w(x). Then 


1 = 0 , 1 , 


? 


n. 


<?(«,) =P(«,) +«(«,) =/(«;)> 


» » » 



418 


APPROXIMATION OF FUNCTIONS 


Furthermore, 


|/(x) -^(x)|<|/(x) -/?(x)| + |w(x) 

< 6 ' + €^M 

< € for ail X e [ a , è ] . 


9.3. APPROXIMATION BY SPLINE FUNCTIONS 

Approximation of a continuons function f{x) with a single polynomial p{x) 
may not be quite adéquate in situations in which f{x) represents a real 
physical relationship. The behavior of such a function in one région may be 
unrelated to its behavior in another région. This type of behavior may not be 
satisfactorily matched by any polynomial. This is attributed to the fact that 
the behavior of a polynomial everywhere is governed by its behavior in any 
small région. In such situations, it would be more appropriate to partition the 
domain of f{x) into several intervals and then use a different approximating 
polynomial, usually of low degree, in each subinterval. These polynomial 
segments can be joined in a smooth way, which leads to what is called a 
piecewise polynomial function. 

By définition, a spline function is a piecewise polynomial of degree n. The 
varions polynomial segments (ail of degree n) are joined together at points 
called knots in such a way that the entire spline function is continuons and its 
first n — 1 dérivatives are also continuons. Spline fonctions were first intro- 
duced by Schoenberg (1946). 


9.3.1. Properties of Spline Functions 

Let [a, h] can be interval, and let a = Tq< r^< ••• < t^< + i ^ be parti- 

tion points in {a,h\ A spline function ^(x) of degree n with knots at the 
points r^, t 2 , . . . , has the following properties: 

i. 5(x) is a polynomial of degree not exceeding n on each subinterval 

l</<m + 1. 

ii. 5(x) has continuons dérivatives up to order n — 1 on [a, h]. 


In particular, if n = 1, then the spline function is called a linear spline and 
can be represented as 


m 


■^(^) = X T,.|, 

i=\ 


where « 2 , . . . , are fixed numbers. We note that between any two knots, 
|x — Tjl , / = 1, 2, . . . , m, represents a straight-line segment. Thus the graph of 
^(x) is made up of straight-line segments joined at the knots. 



APPROXIMATION BY SPLINE FUNCTIONS 


419 


We can obtain a linear spline that resembles Lagrange interpolation: Let 
Oq, Oi, , , , , be given real numbers. For 1 <i <m, consider the functions 



' X — 

Tq <X< Tl, 

/q(x) = < 

^0 - ^1 ’ 


0, 

\ 

Tl <X< T„, + l, 


0, 

xÆ [t,._i,t,.^i], 


X- T;_i 

T,._1 <X< T,., 

li{x) = { 




T,.<X< T,.+ i, 


'^(+1 - ’ 

\ 

1 

0, 

To<X< T„,, 

II 

+ 

s 

1 

, T„,<X<T„, + 1. 

1 

+ 

✓ 


Then the linear spline 

m + 1 

= E (9.21) 

i = 0 

has the property that 5(r^) = Oi, 0 < i <m 1. It can be shown that the linear 
spline having this property is unique. 

Another spécial case is the cubic spline for n = 3. This is a widely used 
spline function in many applications. It can be represented as ^(x) = = 

ai~\- b^x t^_i<x<t^, z = 1,2, . . . , m + 1, such that for i = 



Si(ti) =s, + i(t;), 


In general, a spline of degree n with knots at t 2 , . . . , is represented 
as 


m 

5(x) = Eei(x-r,.)" +p(x), 


i = l 


(9.22) 



420 


APPROXIMATION OF FUNCTIONS 


where e^, ^ 2 , . . . , are constants, p(x) is a polynomial of degree n, and 



(x- T-)”, X> T-, 

0 , 


For an illustration, consider the cubic spline 

+ b^x + c^x^ + diX^, a <x < r, 

Ü2 + ^2^ C2X^ d2X^ , T<X <b. 

Here, ^(x) along with its first and second dérivatives must be continuons at 
x= T. Therefore, we must hâve 

+ b^T-\- + d^T^ = a2~\- d2T^, (9.23) 

b^ + 2c^r + 3diT^ = ^2 + 2c2t+ 3d2T^, (9.24) 

2c^ + 6diT= 2 c2 + 6d2T. (9.25) 

Equation (9.25) can be written as 

Ci~ €2 = 3{d2— di)r, (9.26) 

From équations (9.24) and (9.26) we get 

- ^2 + 3(^/2 - ^ 1 )^^ = 0. (9.27) 

Using now équations (9.26) and (9.27) in équation (9.23), we obtain a^— ü 2 + 
3(di — d 2 )r^ + 3{d2 — d^)T^ + {d^ — <^ 2 = 0? or equivalently, 

— «2 + (^1 “ = 0- (9.28) 

We conclude that 



1 1 1 

di - d^ = -{a^ - a^) = ^{b^-b^) = -{c^-c^). (9.29) 


3r 


3r 


Let us now express ^(x) in the form given by équation (9.22), that is. 


^(x) = c^(x — t) + + «0 + o^^x + a2X^ + a3X^. 







In this case. 



APPROXIMATION BY SPLINE FUNCTIONS 


421 


and 


— e-^T^ Œq = Ü2, 

(9.30) 

3e^T^ a^=b2, 

(9.31) 

-3ciT+ a2 = C2, 

(9.32) 

“h ^2 — ^2 * 

(9.33) 


In light of équation (9.29), équations (9.30)-(9.33) hâve a common solu- 
tion for given by = d 2 — = (l/SrXc^ — C 2 ) = (l/3r^)(b2 — b^) = 

(l/r^X^i -« 2 )- 


9.3.2. Error Bounds for Spline Approximation 

Let a = Tq < < ••• < < r^ + i = b he a partition of [a, b\ We recall that 

the linear spline ^(x) given by formula (9.21) has the property that ^(r.) = 

0 < / < m + 1, where Oq, , 6 *^ + 1 are any given real numbers. In particu- 
lar, if 0^ is the value at of a function f(x) defined on the interval [a, b], 

then ^(x) provides a spline approximation of /(x) over [a, b] which agréés 
with /(x) at Tq, . . . , + If /(x) has continuons dérivatives up to order 2 

over [a, b], then an upper bound on the error of approximation is given by 
(see De Boor, 1978, page 40) 


max |/(x) — ^(x) I < |( maxArd max |/^^^(x)|, 

where Ar^ = + ^ — r^, / = 0, 1, . . . , m. This error bound can be made small by 
reducing the value of max. Ar-. 

A more efficient and smoother spline approximation than the one pro- 
vided by the linear spline is the commonly used cubic spline approximation. 
We recall that a cubic spline defined on [a, b] is a piecewise cubic polynomial 
that is twice continuously différentiable. Let /(x) be defined on [a, b]. There 
exists a unique cubic spline 5 (x) that satisfies the following interpolatory 
constraints: 


=/(l). / = 0,l,...,m + 1, 

(9.34) 

(see Prenter, 1975, Section 4.2). 

If /(x) has continuons dérivatives up to order 4 on [a, b], then information 
on the error of approximation, which results from using a cubic spline, can be 
obtained from the following theorem, whose proof is given in Hall (1968): 



422 


APPROXIMATION OF FUNCTIONS 


Theorem 9.3.1. Let a = Tq < < ••• < < t^ + i = ^ be a partition of 

[a, b]. Let ^(x) be a cubic spline associated with f(x) and satisfies the 
constraints described in (9.34). If f(x) bas continuons dérivatives up to order 
4 on [a, b], then 

( \4 
maxAr. max 

i ' a<x<b 


where Ar^ = + i — r^, / = 0, 1, . . . , m. 

Another advantage of cubic spline approximation is the fact that it can be 
used to approximate the first-order and second-order dérivatives of f(x). 
Hall and Meyer (1976) proved that if f(x) satisfies the conditions of Theorem 
9.3.1, then 


max |/'(x) 

a <x<b 

max |/"(x) 

a <x<b 

Furthermore, the bounds concerning |/(x)— ^(x)| and \f'(x) — s'(x)\ are 
best possible. 


-^'(x)|<^ maxAr,. max |/<^>(x)|, 

^ i ' a <x<b 

— ^"(x)| < ffmaxAr-] max |/^'^^(x)|. 

i ' a<x<b 


9.4. APPLICATIONS IN STATISTICS 

There is a wide variety of applications of polynomial approximation in 
statistics. In this section, we discuss the use of Lagrange interpolation in 
optimal design theory and the rôle of spline approximation in régression 
analysis. Other applications will be seen later in Chapter 10 (Section 10.9). 


9.4.1. Approximate Linearization of Nonlinear Models 
by Lagrange Interpolation 

We recall from Section 8.6 that a nonlinear model is one of the form 

y(x) =/ï(x,0) + e, (9.35) 

where x = (x^, X 2 , . . . , x^)' is a vector of k input variables, 0 = 62 , 0 ^)' 

is a vector of p unknown parameters, e is a random error, and h(x, 0) is a 
known function which is nonlinear in at least one element of 0. 

We also recall that the choice of design for model (9.35), on the basis of 
the Box-Lucas criterion, dépends on the values of the éléments of 0 that 
appear nonlinearly in the model. To overcome this undesirable design 
dependence problem, one possible approach is to construct an approximation 



APPLICATIONS IN STATISTICS 


423 


to the mean response function /z(x, 0) with a Lagrange interpolating polyno- 
mial. This approximation can then be utilized to obtain a design for parame- 
ter estimation which does not dépend on the parameter vector 0. We shall 
restrict our considération of model (9.35) to the case of a single input 
variable x. 

Let us suppose that the région of interest, R, is the interval [a, b], and that 
0 belongs to a parameter space H. We assume that: 

a. h{x, 0) has continuous partial dérivatives up to order r + 1 with respect 
to X over [a, b] for ail 0 e 11, where r is such that r-\- l>p with p 
being the number of parameters in model (9.35), and is large enough so 
that 


2 

('-+!)! 


b — a 
4 


r+l 

sup 

a <x< b 


0 ) 


< 8 


(9.36) 


for ail 0 e H, where ô is a small positive constant chosen appropriately 
so that the Lagrange interpolation of /z(x, 0) achieves a certain accu- 
racy. 

b. h{x, 0) has continuous first-order partial dérivatives with respect to the 
éléments of 0. 

c. For any set of distinct points, Xq, x^, . . . , x^, such that a <Xq<x^< ••• 
<x^<b, where r is the integer defined in (a), the /? X (r + 1) matrix 

U(0) = [V/ï(xq,0): V/ï(xi, 0): ••• :V/z(x^,0)] 

is of rank p, where Vh(x^, 0) is the vector of partial dérivatives of 
h{xi, 0) with respect to the éléments of 0 (/ = 0, 1, . . . , r). 

Let us now consider the points where z, is the ith Cheby- 

shev point defined by formula (9.18). Let /?^(x, 0) dénoté the corresponding 
Lagrange interpolating polynomial for /ï(x, 0) over [a, b], which utilizes the 
z/s as interpolation points. Then, by formula (9.14) we hâve 

r 

jd,(x,e) = (9.37) 

z = 0 


where /^(x) is a polynomial of degree r which can be obtained from formula 
(9.13) by substituting z^ for (z = 0, 1, . . . , r). By inequality (9.19), an upper 
bound on the error of approximating h(x, 0) with /?^(x, 0) is given by 


sup |/z(x, 0) — /?^(x, 0) I < 

a<x<b 


2 

(7TT)! 


b — a 
4 


r+l 


d*'^^h{x, 0 ) 


dx 


r+l 


sup 

a <x<b 



424 


APPROXIMATION OF FUNCTIONS 


However, by inequality (9.36), this upper bound is less than 8. We then hâve 

sup |/z(x, 0) — 0) I < ô (9.38) 

a<x<b 


for ail 0 e fl. This provides the desired accuracy of approximation. 

On the basis of the above arguments, an approximate représentation of 
model (9.35) is given by 

y{x) =pXx,^) + 6. (9.39) 

Model (9.39) will now be utilized in place of h(x, 0) to construct an optimal 
design for estimating 0. 

Let us now apply the Box-Lucas criterion described in Section 8.6 to 
approximate the mean response in model (9.39). In this case, the matrix H(0) 
[see model (8.66)] is an n X/? matrix whose (u, Oth element is âpJ.(x^^, d)/âO-, 
where is the design setting at the uth experimental run (w = 1, 2, . . . , n) 
and n is the number of experimental runs. From formula (9.37) we than hâve 


àOi y'To ’ 




These équations can be written as 


=U(0)A.(x„), 


where A.(x„) = [Iq(xJ, /i(x„), . . . , //x„)]' and U(0) is the /? X (r + 1) matrix 

U(0) = [ V/z(zq, 0) : V/z(z^, 0) : ••• :V/z(z^, 0)] . 


By assumption (c), U(0) is of rank p. The matrix H(0) is therefore of the 
form 


where 


Thus 


H(0) = AU^0), 


A' = [X(xi): X(x2): 


H'(e)H(e) =U(0)A'AU'(0). (9.40) 


If > r + 1 and at least r + 1 of the design points (that is, % 2 , . . . , x„) are 
distinct, then A'A is a nonsingular matrix. To show this, it is sufficient to 
prove that A is of full column rank r+ 1. If not, then there must exist 
constants «q, . . . , not ail equal to zéro, such that 


X =o> 


/ = 0 



? 


n. 


» » » 



APPLICATIONS IN STATISTICS 


425 


This indicates that the rth degree polynomial has n roots, 

namely, X 2 , . . . , This is not possible, because n>r-\-l and at least 
r + 1 of the x„’s (w = 1, 2, . . . , n) are distinct (a polynomial of degree r has at 
most r distinct roots). This contradiction implies that A is of full column 
rank and A'A is therefore nonsingular. 

Applying the Box-Lucas design criterion to the approximating model 
(9.37) amounts to finding the design settings that maximize det[H'(0)H(0)]. 
From formula (9.40) we hâve 

det[H^(0)H(0)] =det[U(0)AAU'(0)]. (9.41) 

We note that the matrix A'A = = i X(Xy)\'(x„) dépends only on the design 

settings. Let X 2 , . . . , x„) and % 2 , . . . , x„) dénoté, respectively, 

the smallest and the largest eigenvalue of A'A. These eigenvalues are 
positive, since A'A is positive definite by the fact that A'A is nonsingular, as 
was shown earlier. From formula (9.41) we conclude that 

det[U(0)U'(0)] X 2 ,...,x„)< det[H'(0)H(0)] 

<det[U(0)U'(0)]<„(xi,X2,...,x„). 

(9.42) 


This double inequality follows from the fact that the matrices 
Vax(^i. ^ 2 . • • • . xJU(0)U'(0) - H'(0)H(0) and H'(0)H(0) - 
t^j^m(xi,X2,...,x„)U(0)U'(0) are positive semidefinite. An application of 
Theorem 2.3.19(1) to these matrices results in the double inequality (9.42) 
(why?). Note that the déterminant of U(0)U'(0) is not zéro, since U(0)U'(0), 
which is of order p Xp, is of rank p by assumption (c). 

Now, from the double inequality (9.42) we deduce that there exists a 
number y, 0 < y < 1, such that 

det[H'(0)H(0)] = + (l-y)<^(xi,X 2 ,...,x„)] 

Xdet[U(0)U'(0)]. 

If y is integrated out, we obtain 



det[HX0)H(0)]^/y = 


ï[<m{Xl,X2,...,X„) + V^^^{X^,X 


2 ’ 



Xdet[U(0)U^0)]. 


Consequently, to construct an optimal design we can consider finding 
Xi, % 2 , . . . , x„ that maximize the fonction 


tA(Xi,X2,...,X„) =^[<„(Xi,X2,...,X„) + <^(Xi,X2,...,xJ]. (9.43) 



426 


APPROXIMATION OF FUNCTIONS 


This is a modified version of the Box-Lucas criterion. Its advantage is that 
the optimal design is free of 0 . We therefore call such a design a parameter- 
free design. The maximization of X 2 , . . . , x„) can be conveniently car- 
ried out by using a FORTRAN program written by Conlon (1991), which is 
based on Price’s (1977) controlled random search procedure. 

Example 9.4.1. Let us consider the nonlinear model used by Box and 
Lucas (1959) of a consecutive first-order Chemical reaction in which a raw 
material A reacts to form a product B, which in turn décomposés to form 
substance C. After time x has elapsed, the mean yield of the intermediate 
product B is given by 


/ï(x, 0) 



where 0^ and O 2 are the rate constants for the reactions A ^ B and B ^ C, 
respectively. 

Suppose that the région of interest R is the interval [0, 10]. Let the 
parameter space 11 be such that 0 < 0^ < 1, 0 < O 2 < lAt can be verified that 


â^'^^h(x, 0 ) 



Let us consider the function w(x, c^) = e By the mean value theo- 
rem (Theorem 4.2.2), 




nr+l 


-»i) 


âw(x, 0^) 
dcj) 


where dw(x, 9^)/d4> is the partial dérivative of w(x, 4>) with respect to (j) 
evaluated at 0^, and where is between 0^ and ^ 2 - Thus 


Hence, 


d^^^h(x, 0 ) 


sup 

0 < a :<10 


dX 


r+l 


< sup [e 

0<x<10 


-d 


r+ 1 -x^^ I] 


< sup 

0 < a ;<10 


r+l —xO^ 



APPLICATIONS IN STATISTICS 


427 


However, 


r + 1 —xO^ 


r 1—xO^ ifr+1 , 

— r— ifr+1 <x0^ . 


Since 0 <10, then 


sup 

0<a:< 10 


r + 1 — 


< max(r + 1, 9 — r) . 


We then hâve 


sup 

0 < a ;<10 


d^^^h{x, 0 ) 


dX 


r+1 


< max(r + 1, 9 — r) . 


By inequality (9.36), the integer r is determined such that 


10 


(r+l)!\ 4 j 


r+l 


max(r + 1,9 — r) < ô. 


(9.44) 


If we choose 3 = 0.053, for example, then it can be verified that the smallest 
positive integer that satisfies inequality (9.44) is r = 9. The Chebyshev points 
in formula (9.18) that correspond to this value of r are given in Table 9.2. On 
choosing n, the number of design points, to be equal to r + 1 = 10, where ail 
ten design points are distinct, the matrix A in formula (9.40) will be 
nonsingular of order 10 X 10. Using Conlon’s (1991) FORTRAN program for 
the maximization of the function ijj in formula (9.43) with p = 2, it can be 
shown that the maximum value of ijj is 17.457. The corresponding optimal 
values of x^, X 2 , . . . , x^q are given in Table 9.2. 


Table 9.2. Chebyshev Points and Optimal Design 
Points for Ëxample 9.4.1 


Chebyshev Points 

Optimal Design Points 

9.938 

9.989 

9.455 

9.984 

8.536 

9.983 

7.270 

9.966 

5.782 

9.542 

4.218 

7.044 

2.730 

6.078 

1.464 

4.038 

0.545 

1.381 

0.062 

0.692 



428 


APPROXIMATION OF FUNCTIONS 


9.4.2. Splines in Statistics 

There is a broad variety of work on splines in statistics. Spline functions are 
quite suited in practical applications involving data that arise from the 
physical world rather than the mathematical world. It is therefore only 
natural that splines hâve many useful applications in statistics. Some of these 
applications will be discussed in this section. 

9.4.2.I. The Use of Cubic Splines in Régression 
Let us consider fitting the model 

y=g{x) + €, (9.45) 

where g(x) is the mean response at x and c is a random error. Suppose that 
the domain of x is divided into a set of m + 1 intervals by the points 
Tq < Tl < ••• < T^ < T^ + i such that on the ith interval (/ = 1, 2, . . . , m + 1), 
g(x) is represented by the cubic spline 


= a- + h-x + c^x^ + d^x^, <x < r-. (9.46) 

As was seen earlier in Section 9.3.1, the parameters (i = 

1, 2, . . . , m + 1) are subject to the following continuity restrictions: 




fl; + Z?; T; + C;T;^ 

+ diT^ = 


iX (9.47) 

that 

is. 

= i- 

= 12 

J., Z., ... , 

m; 




bi + 2C;T; 

+ 3^/jt/ = 


(9.48) 

that 

is. 

=X + i(^;X i ■■ 

= 12 

J-, , 

m; and 




2c,- + bd^Ti 

= 2c, + i + 6^/, + iT^-, 

(9.49) 

that 

is. 

^"(T;)=^"+i(T;X i 

= 12 

X ^ » » » 

,m. The number of unknown parameters 


in model (9.45) is therefore equal to 4(m + 1). The continuity restrictions 
(9.47)-(9.49) reduce the dimensionality of the parameter space to m + 4. 
However, only m + 2 parameters can be estimated. This is because the spline 
method does not estimate the parameters of the s/s directly, but estimâtes 
the ordinates of the s/s at the points Tq, Ti, . . . , r^ + i, that is, ^i(to) and 
/ = 1,2, . . . , m + 1. Two additional restrictions are therefore needed. These 
are chosen to be of the form (see Poirier, 1973, page 516; Buse and Lim, 
1977, page 64): 

or 

2ci + bd^TQ = 7Tq(2ci + 6d^r^), (9.50) 



APPLICATIONS IN STATISTICS 


429 


and 


+ 1 ( '^m + 1 ) 


'^m +1 


*m + l 



or 


2c^ + i + + + i = 7t„ + i(2c^ + i + 6ii„ + iT„), (9.51) 

where ttq and vr^ + i are known. 

Let 3 ^ 1 , 3 ^ 2 ^ • • • ^ -Vrt t)e n observations on the response y, where n>m + 2, 
such that fil observations are taken in the ith interval tJ, / = 1 , 2 , . . . , m 
+ 1. Thus /î = EJlV^r If yivyi 2 ^-‘^yin- ^re the observations in the ith 
interval (/ = 1, 2, . . . , m + 1), then from model (9.45) we hâve 

yij "h ^iji i 1 , 2, . . . , 7?z + 1,7 1, 2, . . . , tî^-, (9.52) 

where Xij is the setting of x for which y=yij, and the 6 -y’s are distributed 
independently with means equal to zéro and a common variance The 
estimation of the parameters of model (9.45) is then reduced to a restricted 
least-squares problem with formulas (9.47)-(9.51) representing linear restric- 
tions on the 4(m + 1) parameters of the model [see, for example, Searle 
(1971, Section 3.6), for a discussion concerning least-squares estimation 
under linear restrictions on the fitted modehs parameters]. Using matrix 
notation, model (9.52) and the linear restrictions (9.47)-(9.51) can be ex- 
pressed as 


y = xp + e, 
Cp = Ô, 


(9.53) 

(9.54) 


where y = (y(: y^: : y^ + i )' with y,- = (y,i, y, 2 > • • • > yin)\ i = 1,2, . . . ,m + l,X 

= Diga(X^, X 2 , . . . ,X^ + i) is a block-diagonal matrix of order n X[4(m + 1)] 
with X, being a matrix of order X 4 whose yth row is of the form 


(1, Xij, xfj,xfjX 7 = 1, 2, . . . , 7Î / = 1, 2, . . . , m + 1; p = (p\: P 2 ' 




with P - = (a-, bi, Cl, di)\ / = 1 , 2 , . . . , m + 1 ; and e = (e^: 


2- 


m + 


m + 


1)' 

iL 


where Cy is the vector of random errors associated with the observations in 
the Ith interval, i = 1, 2, . . . , m + 1. Furthermore, C = [Cq: C^: Q'x- C 3 ]', 
where, for / = 0 , 1 , 2 , 


-e\; e'i; 0' ••• 0' 0' 

0' -e' 2 , e', - 0' 0' 




430 


APPROXIMATION OF FUNCTIONS 


is a matrix of order m X [4(m + 1)] such that = (1, 'r,, r/), = 

(0, 1, 2 t^, 3t/X Cj 2 = (0, 0, 2, 6r^), / = 1, 2, . . . , m, and 

^ 0 0 2(77-0 - 1) 6(7ToTi-To) ••■0 0 0 0 

^ ^ 0 •■•0 0 2(7t^+i-1) 6(7t^ + iT^ - t^ + i) 

is a 2 X [4(m + 1)] matrix. Finally, ô = (ôo: ô\: 82: 83)' =0, where the 
partitioning of 8 into 8g, 8^, 82, and 83 conforms to that of C. Conse- 
quently, and on the basis of formula (103) in Searle (1971, page 113), the 
least-squares estimator of p for model (9.53) under the restriction described 
by formula (9.54) is given by 

P, = P - (X'X)"^C'[c(X'X)"^C'] ~^(cp - ô) 

= p-(X'X)"^C'[c(X'X)"^C']~^Cp, 

where p = (X'X)-iX'y is the ordinary least-squares estimator of p. 

This estimation procedure, which was developed by Buse and Lim (1977), 
demonstrates that the fitting of a cubic spline régression model can be 
reduced to a restricted least-squares problem. Buse and Lim presented a 
numerical example based on Indianapolis 500 race data over the period 
(1911-1971) to illustrate the implémentation of their procedure. 

Other papers of interest in the area of régression splines include those of 
Poirier (1973) and Gallant and Fuller (1973). The paper by Poirier discusses 
the basic theory of cubic régression splines from an économie point of view. 
In the paper by Gallant and Fuller, the knots are treated as unknown 
parameters rather than being fixed. Thus in their procedure, the knots must 
be estimated, which causes the estimation process to become nonlinear. 

9.4.2.2. Designs for Fitting Spline Models 

A number of papers hâve addressed the problem of finding a design to 
estimate the parameters of model (9.45), where g(x) is represented by a 
spline function. We shall make a brief reference to some of these papers. 

Agarwal and Studden (1978) considered a représentation of g(x) over 
0 <x < 1 by a linear spline ^(x), which has the form given by (9.21). Here, 
g”{x) is assumed to be continuons. If we recall, the Bi coefficients in formula 
(9.21) are the values of 5 at Tq, t^, . . . , v + i* 

Let Xi, X2, . . . , x^ be r design points in [0, 1]. Let dénoté the average of 
observations taken at x^ {i= 1, 2, ... , r). The vector 0 = (^g, , 6*^ + 1)' 

can therefore be estimated by 

e=Aÿ, (9.55) 

where ÿ = (ÿ^, ÿ2? • • • ? Xr)' A is an (m + 2) X r matrix. Hence, an estimate 
of g(x) is given by 

g{x) =r(x)0 = F(x)Aÿ, 
where l(x) = [/g(x), lfx \ . . . , /^ + i(x)]'. 


(9.56) 



APPLICATIONS IN STATISTICS 


431 


Now, E(è)=Ag^, where g, = [g(xi), g(x 2 ), . . . , Thus E[g(x)] = 

r(x)A^^, and the variance of g(x) is 

Var[g(x)] =£[l'(x)Ô-l'(x)Ag,]" 

cr^ 

= — l'(x)AD“iA'l(x), 
n 

where D is an r X r diagonal matrix with diagonal éléments 
ni/n,n 2 /n,,,.,rij./n. The mean squared error of g(x) is the variance plus 
the squared bias of g(x). It follows that the integrated mean squared error 
(IMSE) of g(x) (see Section 8.4.3) is 

nco .1 ncü .1 

J= — Var[g(x)] rfr H j\ Bias^[g(x)] 

(Th (Th 

= K+5, 

where ca = (/q^ dx)~^ = and 

Bias2[|(x)] = [g{x) -l'{x)Ag^]\ 


Thus 


/= -^V(x)AD X'l{x)dx+ -l'{x)Ag^Ÿ dx 


= -tr(AD ^A'M) + ^ f[g(x) - l'{x)Ag^Ÿ dx, 

Z Z(T •'0 

where M = /Ql(x)r(x)<ic. 

Agarwal and Studden (1978) proposed to minimize J with respect to (i) 
the design (that is, X 2 , . . . , as well as /î 2 ? • • • ? (ü) the matrix A, 

and (iii) the knots r^, T 2 , . . . , r^, assuming that g is known. 

Park (1978) adopted the Z)-optimality criterion (see Section 8.5) for the 
choice of design when g(x) is represented by a spline of the form given by 
formula (9.22) with only one intermediate knot. 

Draper, Guttman, and Lipow (1977) extended the design criterion based 
on the minimization of the average squared bias B (see Section 8.4.3) to 
situations involving spline models. In particular, they considered fitting 
first-order or second-order models when the true mean response is of the 
second order or the third order, respectively. 

9.4.2.S. Other Applications of Splines in Statistics 

Spline functions hâve many other useful applications in both theoretical and 
applied statistical research. For example, splines are used in nonparametric 



432 


APPROXIMATION OF FUNCTIONS 


régression and data smoothing, nonparametric density estimation, and time 
sériés analysis. They are also utilized in the analysis of response curves in 
agriculture and économies. The review articles by Wegman and Wright (1983) 
and Ramsay (1988) contain many references on the varions uses of splines in 
statistics (see also the article by Smith, 1979). An overview of the rôle of 
splines in régression analysis is given in Eubank (1984). 


FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

Agarwal, G. G., and W. J. Studden (1978), “Asymptotic design and estimation using 
linear splines.” Comm. Statist. Simulation Comput., 7 , 309-319. 

Box, G. E. P., and H. L. Lucas (1959). “Design of experiments in nonlinear situations.” 
Biometrika, 46 , 77-90. 

Buse, A., and L. Lim (1977). “Cubic splines as a spécial case of restricted least 
squares.” /. Amer. Statist. Assoc., 72 , 64-68. 

Cheney, E. W. (1982). Introduction to Approximation Theory, 2nd ed. Chelsea, New 
York. (The Weierstrass approximation theorem and Lagrange interpolation are 
covered in Chap. 3; least-squares approximation is discussed in Chap. 4.) 

Conlon, M. (1991). “The controlled random search procedure for function optimiza- 
tion.” Personal communication. (This is a FORTRAN file for implementing 
Price’s controlled random search procedure.) 

Cornish, E. A., and R. A. Fisher (1937). “Moments and cumulants in the spécification 
of distribution.” Rev. Internat. Statist. Inst., 5, 307-320. 

Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press, 
Princeton. (This classic book provides the mathematical foundation of statistics. 
Chap. 17 is a good source for approximation of density functions.) 

Davis, P. J. (1975). Interpolation and Approximation. Dover, New York. (Chaps. 2, 3, 6, 
8, and 10 are relevant to the material on Lagrange interpolation, least-squares 
approximation, and orthogonal polynomials.) 

De Boor, C. (1978). A Practical Guide to Splines. Springer-Verlag, New York. (Chaps. 
1 and 2 provide a good coverage of Lagrange interpolation, particularly with 
regard to the use of Chebyshev points. Chap. 4 discusses cubic spline approxima- 
tion.) 

Draper, N. R., I. Guttman, and P. Lipow (1977). “All-bias designs for spline functions 
joined at the axes.”/. Amer. Statist. Assoc., 72 , 424-429. 

Eubank, R. L. (1984). “Approximate régression models and splines.” Comm. Statist. 
Theory Methods, 13 , 433-484. 

Gallant, A. R., and W. A. Fuller (1973). “Fitting segmented polynomial régression 
models whose join points hâve to be estimated.” J. Amer. Statist. Assoc., 68 , 
144-147. 

Graybill, F. A. (1983). Matrices with Applications in Statistics, 2nd ed. Wadsworth, 
Belmont, California. 

Hall, C. A. (1968). “On error bounds for spline interpolation.” J. Approx. Theory, 1 , 
209-218. 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


433 


Hall, C. A., and W. W. Meyer (1976). “Optimal error bounds for cubic spline 
interpolation.” J. Approx. Theory, 16, 105-122. 

Harris, B. (1966). Theory of Probability. Addison-Wesley, Reading, Massachusetts. 

Johnson, N. L., and S. Kotz (1970). Continuons Univariate Distributions — 1. Houghton 
Mifflin, Boston. (Chap. 12 contains a good discussion concerning the 
Cornish-Fisher expansion of percentage points.) 

Kendall, M. G., and A. Stuart (1977). The Advanced Theory of Statistics, Vol. 1, 4th ed. 
Macmillan, New York. (This classic book provides a good source for learning 
about the Gram-Charlier sériés of type A and the Cornish-Fisher expansion.) 

Lancaster, P., and K. Salkauskas (1986). Curve and Surface Fitting. Academie Press, 
London. (This book covers the foundations and major features of several basic 
methods for curve and surface fitting that are currently in use.) 

Park, S. H. (1978). “Experimental designs for fitting segmented polynomial régression 

Technometrics, 20 , 151-154. 

Poirier, D. J. (1973). “Pieeewise régression using cubic splines.” J. Amer. Statist. 
Assoc., 68, 515-524. 

Powell, M. J. D. (1967). “On the maximum errors of polynomial approximation 
defined by interpolation and by least squares criteria.” Comput. /., 9 , 404-407. 

Prenter, P. M. (1975). Splines and Variational Methods. Wiley, New York. (Lagrange 
interpolation is covered in Chap. 2; cubic splines are discussed in Chap. 4. An 
interesting feature of this book is its coverage of polynomial approximation of a 
function of several variables.) 

Price, W. L. (1977). “A controlled random search procedure for global optimization.” 
Comput. /., 20 , 367-370. 

Ramsay, J. O. (1988). “Monotone régression splines in action.” Statist. Sci., 3, 
425-461. 

Rice, J. R. (1969). The Approximation of Functions, Vol. 2. Addison-Wesley, Reading, 
Massachusetts. (Approximation by spline functions is presented in Chap. 10.) 

Rivlin, T. J. (1969). An Introduction to the Approximation of Functions. Dover, New 
York. (This book provides an introduction to some of the most significant 
methods of approximation of functions by polynomials. Spline approximation is 
also discussed.) 

Schoenberg, I. J. (1946). “Contributions to the problem of approximation of équidis- 
tant data by analytic functions.” Quart. Appl. Math., 4 , Part A, 45-99; Part B, 
112-141. 

Searle, S. R. (1971). Linear Models. Wiley, New York. 

Smith, P. L. (1979). “Splines as a useful and convenient statistical tool.”Amcr. Statist., 
33, 57-62. 

Szidarovszky, F., and S. Yakowitz (1978). Principles and Procedures of Numerical 
Analysis. Plénum Press, New York. (Chap. 2 provides a brief introduction to 
approximation and interpolation of functions.) 

Wegman, E. J., and I. W. Wright (1983). “Splines in statistics.” /. Amer. Statist. 
Assoc., 78 , 351-365. 

Wold, S. (1974). “Spline functions in data analysis.’’ Technometrics, 16, 1-11. 



434 


APPROXIMATION OF FUNCTIONS 


EXERCISES 
In Mathematics 

9.1. Let f(x) be a function with a continuons dérivative on [0, 1], and let 
b^(x) be the nth degree Bernstein approximating polynomial of /. 
Then, for some constant c and for ail n, 

c 

sup \f{x) - b„{x)\< 

0<x<l ^ 


9.2. Prove Lemma 9.1.1. 

9.3. Prove Lemma 9.1.2. 

9.4. Show that for every interval [ — a, a] there is a sequence of polynomials 
pj^x) such that p„(0) = 0 and lim„^^/^„(x) = |x| uniformly on [-a, a]. 

9.5. Suppose that f(x) is continuons on [0, 1] and that 



H — 0 , 1 , 2 ,... . 


Show that f(x) = 0 on [0, 1]. 

[Hint: fQf(x)pJ^x) dx = 0, where p„(x) is any polynomial of degree n.] 

9.6. Suppose that the function f(x) has n 1 continuons dérivatives on 
[a, b]. Let a=ÜQ<a^< ••• <a^=bb&n-\-l points in [a, b]. Then 


sup 

a <x<b 


n + 1 


\f{x) -p{x)\< 


Tn + lh 

4(tî + 1) 


where p(x) is the Lagrange polynomial defined by formula (9.14), 
+ i = sup^<^<^|/^”’^^^(x)|, and h = max(a.^i - a-), / = 0, 1, . . . , n - 1. 
[Hint: Show that / A).] 

9.7. Apply Lagrange interpolation to approximate the function f{x) = log x 

over the interval [3.50,3.80] using «q = 3.50, = 3.60, «2 = 3.70, and 

«3 = 3.80 as interpolation points. Compute an upper bound on the 
error of approximation. 

9.8. Let a = Tq< T^< ••' < T^=b be a partition of [«, b\ Suppose that 
/(x) has continuons dérivatives up to order 2 over [a, b]. Consider a 



EXERCISES 


435 


cubic spline ^(x) that satisfies 

^'(«) =/'(«)> 
s'ib)=f'(b). 


Show that 


f dx> f [s''(x)Ÿdx. 

''a ''a 


9.9. Déterminé the cubic spline approximation of the function f(x) = 
cos( 27 tx) over the interval [0, tt] using five evenly spaced knots. Give an 
upper bound on the error approximation. 


In Statistics 

9.10. Consider the nonlinear model 

y(x) =/z(x,0) + 6, 


where 


/z(x,0) = 0i(l — ^2 ^ 

such that 0 < < 50, 0 < ^2 ^ 0 ^ ^3 ^ 1- Obtain a Lagrange inter- 

polating polynomial that approximates the mean response function 
h(x, 0) over the région [0, 8] with an error not exceeding ô = 0.05. 

9.11. Consider the nonlinear model 

y = a-\- (0.49 — a)exp[ — /3(x — 8)] + 6, 

where c is a random error with a zéro mean and a variance 
Suppose that the région of interest is the interval [10,40], and that the 
parameter space 11 is such that 0.36 < a< 0.41, 0.06 < /3 < 0.16. Let 
5(x) be the cubic spline that approximates the mean response, that is, 
7](x, a, I3)= a-\- (0.49 — a)exp[- j8(x — 8)], over [10,40]. Détermine 
the number of knots needed so that 

max \r](x, a, 13) — s(x)\< 0.001 
10<x<40 


for ail (a, )S) e D. 



436 


APPROXIMATION OF FUNCTIONS 


9 . 12 . Consider fitting the spline model 

y = 1^0 + l^lX+ ^2{x -Ol)\+€ 

over the interval [—1, 1], where a is a known constant, — 1 < a < 1. A 
three-point design consisting of % 2 , X 3 with —1 <Xi< a <%2 <^3 < 
1 is used to fit the model. Using matrix notation, the model is written as 

y = X|î + €. 


where X is the matrix 



1 Xi 

1 X2 
1 X3 


0 



? 



and P = ( )So, /3i, /32)'. Détermine x^, X 2,^3 so that the design is D- 
optimal, that is, it maximizes the déterminant of X'X. 

[Note: See Park (1978).] 



CH APTER 1 0 


Orthogonal Polynomials 


The subject of orthogonal polynomials can be traced back to the work of the 
French mathematician Adrien-Marie Legendre (1752-1833) on planetary 
motion. These polynomials hâve important applications in physics, quantum 
mechanics, mathematical statistics, and other areas in mathematics. 

This chapter provides an exposition of the properties of orthogonal 
polynomials. Emphasis will be placed on Legendre, Chebyshev, Jacobi, 
Laguerre, and Hermite polynomials. In addition, applications of these poly- 
nomials in statistics will be discussed in Section 10.9. 


10.1. INTRODUCTION 

Suppose that f(x) and g(x) are two continuons functions on [a, b]. Let w(x) 
be a positive function that is Riemann intégrable on [a, b]. The dot product 
of f(x) and g(x) with respect to w(x\ which is denoted by is defined 

as 


if-g)a, = fy{x)g{x)w{x) dx. 


The norm of /(x) with respect to w(x), denoted by II/IL, is defined as 
ll/IL = The functions f{x) and g(x) are said to be 

orthogonal [with respect to w(x)] if (/*g)o> = 0- Furthermore, a sequence 
{/„(x)}“=o of continuons functions defined on [a, b] are said to be orthogonal 
with respect to w(x) if (/^ */„)^ = 0 for m n. If, in addition, ||/„IL = 1 for 
ail 7î, then the functions /„(x), n = 0, 1,2, . . . , are called orthonormal. In 
particular, iî S = {/?„(x)}^=o ^ sequence of polynomials such that (/?„ 'Pm)o> 

= 0 for ail m then S forms a sequence of polynomials orthogonal with 
respect to w(x). 

A sequence of orthogonal polynomials can be constructed on the basis of 
the following theorem: 


437 



438 


ORTHOGONAL POLYNOMIALS 


Theorem 10.1.1. The polynomials which are defined accord- 

ing to the following récurrence relation are orthogonal: 


Po{x) = 1, 

, , {xPü-Pü)^ 

Pl(x) =x ^ — =X 5—, 

\\p,\t iii|ii (10.1) 


Pn{x) = {x-a„)p„_^{x) -b„p„_ 2 {x), n = 2,3,..., 


where 


{xPn-l-Pn-l)œ 

(10.2) 


l|Pn-l û) 


{xPn-l-Pn-2)o> 

(10.3) 

Il 1 ^ 

\\Pn-2 \ (O 



Proof We show by mathematical induction on n that iPn'Pi)(o^^ for 
i < n. For n = 1, 


{Pi-Po).> 



w(x) dx 


= (x-l)„-(x-l) 



2 

(O 

T 

ù) 



Now, suppose that the assertion is true for n — 1 (n > 2). To show that it is 
true for n. We hâve that 


{Pn-Pi)a,= f[{x- a„)p^_^{x) -b^p^_ 2 {x)]pi{x)w{x) dx 

'^a 

= (xPn-l -Pd^-anipn-l ' Pi) a, ~ Ki Pn-2 ' Pi) a, ■ 


Thus, for / = 7Î — 1, 


( Pn -Pi) „ = {XP „-1 -Pn-l) - «„ll Pn-1 lit “ 'Pn-^) 


(O 



INTRODUCTION 


439 


by the définition of in (10.2) and the fact that iPn- 2 'Pn-i^<o^^Pn-i 
Pn-i^oy = 0- Similarly, for i = n — 2, 


{Pn -Pi) 0 , = i^Pn-\-Pn-2) <o~ aniPn-i-Pn-l) <o-K{Pn-2-Pn-2) 


(O 


= {xPn-l-Pn-2)o,-K\\Pn-2 
= 0, by(10.3). 


2 


O) 


Finally, for i <n — 2,w& hâve 


(Pn-Pi) œ = (xPn-l-Pi) ^niPn-l-Pi) <^-K{Pn-2-Pi) 


O) 


= Cxp„_^{x)pi{x)w{x)dx. 

'^a 


(10.4) 


But, from the récurrence relation, 




that is. 


XPi(x) =A + i(x) +fl; + iA(x) +b, + iP,._i(x). 


It follows that 



-iix)Pi{x)w{x)dx 


= / Pn-iix)[Pi+i{x) +ai^iPi(x) +bi^iPi_^{x)]w{x)dx 
'^a 

= {Pn-l -Pi + l)^ + ai + ^{p„_^ ■Pi)o, + bi+x{Pn-X -Pi-l)^ 

= 0 . 

Hence, by (10.4), ( p^ •p,)„ = 0. □ 

It is easy to see from the récurrence relation (10.1) that /?„(x) is of degree 
7î, and the coefficient of x” is equal to one. Furthermore, we hâve the 
following corollaries: 

Corollary 10.1.1. An arbitrary polynomial of degree < n is uniquely 
expressible as a linear combination of p^x ), . . . , p„(x). 

Corollary 10.1.2. The coefficient of in p^ix) is — {n > 1). 

Proof. If dénotés the coefficient of in p„(x) (n > 2), then by 

comparing the coefficients of on both sides of the récurrence relation 



440 


ORTHOGONAL POLYNOMIALS 


(10.1), we obtain 

d — d ^ fl ^ ^ — 2,3,.... (10. 5) 

The resuit follows from (10.5) and by noting that d^= —a^ □ 

Another property of orthogonal polynomials is given by the following 
theorem: 


Theorem 10.1.2. If {/?„(x)}“=o ^ sequence of orthogonal polynomials 

with respect to w(x) on [a, b], then the zéros of p„(x) (n > 1) are ail real, 
distinct, and located in the interior oî[a,b]. 

Proof Since n>l, then f^p^(x)w(x)dx = 0. This indi- 

cates that p^ix) must change sign at least once in (a, b) [recall that w(x) is 
positive]. Suppose that p„(x) changes sign between a and b at just k points, 
denoted by ,^ 2 , . . . , Let g(x) = (x -Xi)(x -X 2 ) •** (x -x^). Then, 
p„(x)g(x) is a polynomial with no zéros of odd multiplicity in (a, b). Hence, 
f^pj^x)gix)w(x) dx ^ 0, that is, (pn'g)(o ^ 0. If Æ <tî, then we hâve a contra- 
diction by the fact that p^ is orthogonal to g(x) [g(x), being a polynomial of 
degree Æ, can be expressed as a linear combination of Pq(x), Piix ), . . . , Pj^ix) 
by Corollary 10.1.1]. Consequently, k = n, and p„(x) has n distinct zéros in 
the interior oi[a,b\ □ 


Particular orthogonal polynomials can be derived depending on the choice 
of the interval [a,b\ and the weight function w(x). For example, the 
well-known orthogonal polynomials listed below are obtained by the follow- 
ing sélections of [a, b] and w(x): 


Orthogonal Polynomial a 

Legendre — 

Jacobi — 

Chebyshev of the first kind — 

Chebyshev of the second kind — 

Hermite — < 

Laguerre 


b w(x) 

1 1 

1 (l-x)“(l+x)^ a,l3> -1 

1 (l-x^)-'/^ 

1 (l-x^)'/" 

00 

00 a > — 1 


These polynomials are called classical orthogonal polynomials. We shall 
study their properties and methods of dérivation. 


10.2. LEGENDRE POLYNOMIALS 


These polynomials are derived by applying the so-called Rodrigues formula 


1 d\x^-l) 


Pn{x)=^n 


n 


2"n\ 


dx’' 


n — 0, 1, 2, . . . . 



LEGENDRE POLYNOMIALS 


441 


Thus, for 7î = 0, 1, 2, 3, 4, for example, we hâve 


pM = 1 , 

jDl(x) =X, 

P2{x)=lx^-h, 

Piix) = - \x, 

P,(x) = fx^-fx2 + |. 


From the Rodrigues formula it follows that /?„(x) is a polynomial of degree n 

and the coefficient of x” is We can multiply p„(x) by to 

make the coefficient of x” equal to one (n = 1, 2, . . . ). 

Another définition of Legendre polynomials is obtained by means of the 
generating function, 


g{x,r) 


1 

{l-2n + r^Ÿ^^ ’ 


by expanding it as a power sériés in r for sufficiently small values of r. The 
coefficient of r” in this expansion is p„(x), n = 0, 1, . . . , that is, 


CO 

g(x,r) = E Pn{x)r". 

n = 0 

To demonstrate this, let us consider expanding (1 in a neighborhood 

of zéro, where z = 2xr — r^: 


1 

(1-^) 


1/2 




16 - 


Z 


< 1 , 


= 1 + ^(2xr — r^) + |(2xr — + ^(2xr — + ••• 

= 1 + xr + (§x^ — + (fx^ — §x)r^ + ••• . 


We note that the coefficients of 1, r, r^, and are the same as Pq(xX 
/? 2 (^)î was seen earlier. In general, it is easy to see that the 

coefficient of r” is p„(x) (n = 0, 1, 2, . . . ). 

By differentiating g(x, r) with respect to r, it can be seen that 


(l-2/x + r2) 


âg{x,r) 


- =0- 


âr 



442 


ORTHOGONAL POLYNOMIALS 


By substituting g(x, r) = in this équation, we obtain 


(l-2n: + r^) Z) ^ 

n = l 


CO 

-ix-r) i:p„ix)r’' = 0. 


n = 0 


The coefficient of r” must be zéro for each n and for ail values of x 
(tî = 1, 2, . . . )• We thus hâve the following identity: 

in + ^)p„ + lix)-{2n + ^)xp„{x)+np„_■^{x)=0, n = l,2, ... (10.6) 

This is a récurrence relation that connects any three successive Legendre 
polynomials. For example, for P 2 (x) = Psix) = — |x, we find 

from (10.6) that 


P4(X) = Wxp^{x) -3p2(x)] 


35^4 _ , 3 

g A g A 8 . 


10.2.1. Expansion of a Function Using Legendre Polynomials 

Suppose that f(x) is a function defined on [- 1, 1] such that f^_^f(x)pj^x)dx 
exists for n = 0, 1, 2, Consider the sériés expansion 


f(x) 


CO 


LaiPi(x). 

i = 0 


(10.7) 


Multiplying both sides of (10.7) by p^ix) and then integrating from — 1 to 1, 
we obtain, by the orthogonality of Legendre polynomials. 





f{x)p„{x)dx, 

1 


n = 0, 1, . . . . 


It can be shown that (see Jackson, 1941, page 52) 



2 

2/Î + r 


H — 0, 1, 2, . . . . 


Hence, the coefficient of Pn(x) in (10.7) is given by 



2n 1 



f{x)Pn{x)dx. 

1 


2 


( 10 . 8 ) 



JACOBI POLYNOMIALS 


443 


If ^„(x) dénotés the partial sum of the sériés in (10.7), then 

7Î + 1 .1 f(t) 

^n(x) = / ; [Pn+l{0Pn{x) ~ Pn(t) Pn + l(x)]dt , 

L •' -1 t ~X 

/î = 0, 1, 2, . . . . (10.9) 

This is known as ChristoffeVs identity (see Jackson, 1941, page 55). If /(x) is 
continuons on [—1,1] and has a dérivative at x=Xq, then 5 „(xq) = 

/(xq), and hence the sériés in (10.7) converges at Xq to the value /(xq) (see 
Jackson, 1941, pages 64-65). 


10.3. JACOBI POLYNOMIALS 


Jacobi polynomials, named after the German mathematician Karl Gustav 
Jacobi(1804-1851), are orthogonal on [ — 1,1] with respect to the weight 
function w(x) = (1 — x)"(l +x)^, a> —1, f5> —1. The restrictions on a 
and /3 are needed to guarantee integrability of w(x) over the interval [ — 1, 1]. 
These polynomials, which we dénoté by p^“’^Xx), can be derived by applying 
the Rodrigues formula: 


(i+^) 


2^n\ 


dx^ 

7î = 0,1,2,... . (10.10) 


This formula reduces to the one for Legendre polynomials when a = 13 = 0. 
Thus, Legendre polynomials represent a spécial class of Jacobi polynomials. 

Applying the so-called Leibniz formula (see Exercise 4.2 in Chapter 4) 
concerning the nth dérivative of a product of two fonctions, namely, /„(x) = 
(1 — x)""^” and gn(x) = (1 +x)^^” in (10.10), we obtain 

Tî = 0, 1, . . . , 

( 10 . 11 ) 




n 


= L\''-]frKx)gi'‘-^\x), 


dx" 


i = 0 


where for / = 0, 1, . . . , n, /i'\x) is a constant multiple of (1— x)“'^””' = 
(1 — x)“(l — x)”“', and gi^~^\x) is a constant multiple of (l+x)^'^' = 
(1 +x)^(l +x)'. Thus, the nth dérivative in (10.11) has (1 — x)'^(l +x)^ as a 
factor. Using formula (10.10), it can be shown that /?^"’^Xx) is a polynomial 
of degree n with the leading coefficient (that is, the coefficient of x”) equal 
to (1 /2”7î!)r(27î + a + /3 + l)/r(7î + a + + 1). 



444 


ORTHOGONAL POLYNOMIALS 


10.4. CHEBYSHEV POLYNOMIALS 

These polynomials were named after the Russian mathematician Pafnuty 
Lvovich Chebyshev (1821-1894). In this section, two kinds of Chebyshev 
polynomials will be studied, called, Chebyshev polynomials of the first kind 
and of the second kind. 

10.4.1. Chebyshev Polynomials of the First Kind 

These polynomials are denoted by T„(x) and defined as 

r„(x) = cos(/î Arccos x), n = 0, 1, . . . , (10.12) 

where 0 < Arccosx < tt. Note that TJ^x) can be expressed as 

r„(x) =x" + n = 

(10.13) 

where — l<x<l. Historically, the polynomials defined by (10.13) were 
originally called Chebyshev polynomials without any qualifying expression. 
Using (10.13), it is easy to obtain the first few of these polynomials: 

To(x) = l, 

Ti(x) =x, 

T2{x) = 2x^ — 1 , 

T^ix) = 4x^ — 3x, 

T 4 (x)= 8 x^- 8 x^ + 1 , 

T^{x) = 16x^ — 20x^ + 5x, 


The following are some properties of T„(x): 

1 . — 1 < T„(x) < 1 for — 1 <x < 1. 

2 . r„(-x) = (-!)" r„(x). 

3. T„(x) has simple zéros at the following n points: 



» » » 



CHEBYSHEV POLYNOMIALS 


445 


We may recall that these zéros, also referred to as Chebyshev points, 
were instrumental in minimizing the error of Lagrange interpolation in 
Chapter 9 (see Theorem 9.2.3). 

4. The weight function for T„(x) is w(x) = (1 To show this, we 

hâve that for two nonnegative integers, m,n, 



(10.14) 


and 

f cos^ n0d0= I n^O, (10.15) 

•^0 [tt, n = 0 

Making the change of variables x = cos^ in (10.14) and (10.15), we 
obtain 


fl T„(x)T„(x) 

‘-1 (1-x^Ÿ^^ 


dx = 0, 


m ¥^n, 




( 





7t/2, 


n 

n = 0. 


This shows that {T„(x)}“_o forms a sequence of orthogonal polynomials 
on [-1, 1] with respect to w(x) = (1 

5. We hâve 


Tn + ii^) =2xT„(x) -r„_i(x), 7 î = 1,2,... . (10.16) 

To show this récurrence relation, we use the following trigonométrie 
identities: 

cos[(/r + 1) = cos cos 0 — sinnOsin 0, 

cos[(/î — 1) = cos cos 0 + sin sin 0. 

Adding these identities, we obtain 

cos[(7î + 1) = 2 cos cos 0 — cos[(/î — 1) ^] . (10.17) 

If we set X = cos ^ and cos n0=T^(x) in (10.17), we obtain (10.16). 
Recall that Tq(x) = 1 and T^{x) =x. 


10.4.2. Chebyshev Polynomials of the Second Kind 

These polynomials are defined in terms of Chebyshev polynomials of the first 
kind as follows: Differentiating TJ^x) = cos with respect to x = cos 6, we 



446 


ORTHOGONAL POLYNOMIALS 


obtain, 


dT^{x) dO 

= —nsinnO — 

dx dx 


sin nO 

= n — ^ . 

sin 0 


Let Uj^x) be defined as 



1 

/î + 1 dx 

sin[(7î + 1)^] 
sin 0 


n = 0, 1, . . . 


(10.18) 


This polynomial, which is of degree n, is called a Chebyshev polynomial of 
the second kind. Note that 


sin n 6 cos 0 + cos n ^ sin ^ 


= xt/„_i(x) +T;(x), n= 1,2,..., (10.19) 

where Uq(x) = 1. Formula (10.19) provides a récurrence relation for UJ^x). 
Another récurrence relation that is free of TJ^x) can be obtained from the 
following identity: 

sin[(7î + 1) 0] = 2 sin cos ^ — sin[(7î — 1) ^] . 


Hence, 


U^{x) =2xf/„_i(x) - ^ = 2,3,... . (10.20) 

Using the fact that f/o(x) = 1, Ui(x) = 2x, formula (10.20) can be used to 

dérivé expressions for f/„(x), n = 2,3, It is easy to see that the leading 

coefficient of x" in U„(x) is 2". 

We now show that {f/„(x)}“=o forms a sequence of orthogonal polynomials 
with respect to the weight function, w(x) = (1 — over [—1,1]. From 
the formula 



m 



HERMITE POLYNOMIALS 


447 


we get, after making the change of variables x = cos 0, 

f U^{x)U^{x){l—x^Ÿ^^ dx = 0, m=^n. 

•^-1 

This shows that w(x) = (1 — x^Y^^ is a weight function for the sequence 
Note that — x^Y^^ dx = tt/2. 

10.5. HERMITE POLYNOMIALS 

Hermite polynomials, denoted by were named after the French 

mathematician Charles Hermite (1822-1901). They are defined by the Ro- 
drigues formula, 


//„(X)=(-1)V-V2 



n = 0,l,2,.... (10.21) 


From (10.21), we hâve 






dx" 


( 10 . 22 ) 


By differentiating the two sides in (10.22), we obtain 




dx 


rt + 1 


= (-i) 


n 


-xe“^'/2^„(x) +e“^'/2 


dH„{x) 

dx 


. (10.23) 


But, from (10.21), 






dx 


n + \ 


(10.24) 


From (10.23) and (10.24) we then hâve 


//„ + i(x) =xH„{x) - 


dH^jx) 

dx 


fl — 0 , 1 , 2 ,..., 


which defines a récurrence relation for the sequence {//„(x)}^_q. Since 
Hq{x) = 1, it follows by induction, using this relation, that Hj^x) is a 
polynomial of degree n. Its leading coefficient is equal to one. 



448 


ORTHOGONAL POLYNOMIALS 


Note that if w(x) = e then 


w(x — t) =exp 




t 


2 \ 


\- tx 

2 2 

^2 \ 


= w(x) expl tx — — 


Applying Taylor’s expansion to wix — t), we obtain 

” (-1)" „^"[w(x)] 
wix-t) = L — ?«- 


n = 0 


dx^ 


CO 


V 


= X —:Hn{x)w{x). 
n\ 


n = 0 


Consequently, //„(x) is the coefficient of t"/n! in the expansion of exp(tx — 
t^/2). It follows that 




«Pl „[4] 

+ 


2 . 1 ! 


22 - 2 ! 


- 


n 


[ 6 ] 


2^-3! 


+ 


where = n(n — ï)(n — 2) ••• (n — r + 1). This particular représentation of 
H^ix) is given in Kendall and Stuart (1977, page 167). For example, the first 
seven Hermite polynomials are 

//o(x) = l, 

H^{x) =x, 

H2{x) =X^ — 1 , 


^3(^) 

H,(x) 

Hs(x) 


= x^- 


= x^ - 


= x^- 


H^(x) =x^ — ISx"^ + 45x^ — 15, 


3x, 

6x^ + 3, 
lOx^ + 15x, 


Another récurrence relation that does not use the dérivative of HJ^x) is 
given by 


tin + l{x) =XH^{X) -7î//„_i(x), 7Î = 1,2,..., 


(10.25) 



HERMITE POLYNOMIALS 


449 


with Hq(x) = 1 and H^(x) =x. To show this, we use (10.21) in (10.25): 


(- 1 ) 


n + l x^/2 ^ 


= x(-l)”e^ 


^«(g-xV2) 


dx" 




dx 


n — 1 


or equivalently, 


^n + l(^-^V2) ^«(^-^V2) ^«-l^^-xV2) 

— =x — \-n — 


dx 


n + \ 


dx^ 


dx 


n — 1 


This is true given the fact that 


d{e~^ /2) 
dx 




Hence, 




dx 


n + 1 


dx^ 

^n-l(^-^V2) 

= n — — ; hx — 


dx 


n — 1 


dx^ 


(10.26) 


which results from applying Leibniz’s formula to the right-hand side of (10.26). 

We now show that {//„(x)}“=q forms a sequence of orthogonal polynomials 
with respect to the weight function wix) = e~^ over ( — co, oo). For this 
purpose, let m, n be nonnegative integers, and let c be defined as 


/ oo 9 

e-^ /^H^{x)H^{x)dx. 

— 00 


(10.27) 


Then, from (10.21) and (10.27), we hâve 


nr rf"(e-" /2) 

c = (-l)7 HM - - dx. 

— 00 


dx^ 


Integrating by parts gives 


c = (-1) 


n 


HM 




T 00 


dx 


n — 1 


— 00 


dH^{x) 


dx 


dx 


n — 1 


dx 


— CO 




— CO 


dx 


dx 


n — 1 


(10.28) 



450 


ORTHOGONAL POLYNOMIALS 


Formula (10.28) is true because H^(x) , which is a poly- 
nomial multiplied by e~^ bas a limit equal to zéro as By 

repeating the process of intégration by parts m — 1 more times, we obtain for 
n > m 


c = (-l) 


m +n 


/ dx. 


— CO 


dx 


dx^ 


(10.29) 


Note that since HJ^x) is a polynomial of degree m with a leading coefficient 
equal to one, d'^{Hj^x)\/dx'^ is a constant equal to m!. Furthermore, since 
n> m, then 

.OO 

/ ^ -dx = Q. 

J-oo dx^-^ 


It follows that c = 0. We can also arrive at the same conclusion if n<m. 
Hence, 


— CO 


m n. 


This shows that {F^„(^)}^=o ^ sequence of orthogonal polynomials with 

respect to w(x) = e~^ over ( — ao, ao). 

Note that if m = n in (10.29), then 


— 00 


dx" 


/ CO 

e~^ dx 

— 00 


^CO 

= 2n\ f e~^ dx 

•'O 


= n\}/2 


TT 


By comparison with (10.27), we conclude that 



^"/^H^{x)dx = n\^/27^ . 


(10.30) 


Hermite polynomials can be used to provide the following sériés expansion 
of a function /(x): 

CO 

f{x)= Ec„//„(x), 

« = 0 


(10.31) 



LEGUERRE POLYNOMIALS 


451 


where 


= -r^ / ^ 

nlyATT •'-00 

Formula (10.32) follows from multiplying both sides of (10.31) with 
^-x integrating over ( — and noting formula (10.30) and the 

orthogonality of the sequence {//„(x)}“=o- 


/^f{x)H^{x) dx, 7Î = 0, 1, . . . . (10.32) 


10.6. LAGUERRE POLYNOMIALS 


Laguerre polynomials were named after the French mathematician Edmond 
Laguerre (1834-1886). They are denoted by L^„"^(x) and are defined over the 
interval (0, oo), n = 0, 1, 2, . . . , where u > — 1. 

The development of these polynomials is based on an application of 
Leibniz formula to finding the nth dérivative of the function 

(/)„(x) 

More specifically, for u> — 1, L^^Xx) is defined by a Rodrigues-type for- 
mula, namely, 

= e^x-^ /r = 0,l,2,.... 

We shall henceforth use L^(x) instead of Ü-^Kx). 

From this définition, we conclude that L„(x) is a polynomial of degree n 
with a leading coefficient equal to one. It can also be shown that Laguerre 
polynomials are orthogonal with respect to the weight function w(x) = e~^x"^ 
over (0, oo), that is. 


/ e ^x"L^(x)L„(x) A: = 0, mi=n 

h 

(see Jackson, 1941, page 185). Furthermore, if m =tî, then 

9 

/ c”^x"[L„(x)] dx = n\T{a-\-n-\-l), tî = 0,1,.... 

•^0 

A function f{x) can be expressed as an infinité sériés of Laguerre 
polynomials of the form 


00 


f{x) = i; c„L„(x), 


n = 0 



452 


ORTHOGONAL POLYNOMIALS 


where 


1 


c. = 


” n\T{a + 7î + 1) -^0 


-00 

/ e ^x"^L^{x)f{x) dx, /î = 0, 1, 2, . . . . 

Jf\ 


A récurrence relation for L„(x) is developed as follows: From the définition 
of L„(x), we hâve 


(-l)”x"c-"L„(x) = /r = 0,l,2,... . (10.33) 


Replacing n by n + 1 in (10.33) gives 


(-l)""V“c-L„,,(x) = — ' 




n + \ 


(10.34) 


Now, 


X 


n + a+ 1 ^ —X 


e~^ =x{x^^^e~^). 


(10.35) 


Applying the Leibniz formula for the {n + l)st dérivative of the product on 
the right-hand side of (10.35) and noting that the nth dérivative of x is zéro 
for 7î = 2, 3, 4, . . . , we obtain 




n + \ / + CK ^ —X 


dx 


n + \ 


= X 


dx 


n + \ 


+ (n + 1) 




= x 


d 

dx 


^«(^«+ag-x) 


^n/^n+ag-x) 

+ (« + !) -^n > (10-36) 


Using (10.33) and (10.34) in (10.36) gives 




d 


= x- [( - l)”x“e-L„(x)] + ( - l)\n + l)x“e-L„(x) 


= ( — l)”x“c 


—X 


( a + 7Î + 1 —x)L^(x) +x 


dL„{x) 

dx 


(10.37) 


Multiplying the two sides of (10.37) by (— l)”'^^c'^x ", we obtain 


dL^{x) 

^n + \{^) = (x — a — n— l)LJx) —x , n = 0, 1, 2, . . . . 

dx 


This récurrence relation gives L„ + ^(x) in terms of L„(x) and its dérivative. 
Note that Lg(x) = 1. 



LEAST-SQUARES APPROXIMATION WITH ORTHOGONAL POLYNOMIALS 


453 


Another récurrence relation that does not require using the dérivative of 
L„(x) is given by (see Jackson, 1941, page 186) 

L„+i(x) = {x — a — 2n — l)L„(x) — n{a n)L^_^{x) , n = 1,2, . . . . 

10.7. LEAST-SQUARES APPROXIMATION WITH ORTHOGONAL 
POLYNOMIALS 

In this section, we consider an approximation problem concerning a continu- 
ons function. Suppose that we hâve a set of polynomials, orthogo- 

nal with respect to a weight function w(x) over the interval [a, b]. Let f(x) 
be a continuons function on [a, b]. We wish to approximate f(x) by the sum 
E”=o where Cq, . . . , are constants to be determined by minimiz- 

ing the function 


y(co,Ci,...,c„) = f 

'^a 


n 


~i 2 


'LCiPi(x) 

i = 0 


-f(x) 


w(x) dx, 


that is, y is the square of the norm, “/Il w If we differentiate y 

with respect to and equate the partial dérivatives to zéro, we 

obtain 


ây 


n 


de 


f a 




7 = 0 


p^{x)w{x) dx = Q, 7 = 0, 1, . . . , 7Î. 


Hence, 



7 = 0 


fyiix)Pj{x)w{x)dx 



jyix)Piix)w(x) dx. 



(10.38) 


Equations (10.38) can be written in vector form as 


Sc = u, (10.39) 

where c = (cq, . . . , c„)', u = (mq? • • • , with w- = ///(x)/?/x)w(x) A:, 
and S is an (n ï) X (n 1) matrix whose (/,y)th element, s^j, is given by 




(x) Pjix)w{x) dx, 


/, 7 = 0 , 1 ,. .., 7 î . 


Since p^ix), pi(x), . . . , pj^x) are orthogonal, then S must be a diagonal 
matrix with diagonal éléments given by 



f pf(x)w(x) dx 

^ a 



i = 0,l, 


? 


72 . 


2 

<û ’ 


» » » 



454 


ORTHOGONAL POLYNOMIALS 


From équation (10.39) we get the solution c = S ^u. The ith element of c is 
therefore of the form 


Ui 


— ^ ^ l — 0515...5/Î. 

Il Pi II; 

For such a value of c, y has an absolute minimum, since S is positive definite. 
It follows that the linear combination 

n 

pt{x)= Y.CiPi{x) 

i = 0 

« ( f'v ) 

= Z —^Pi{x) (10.40) 

minimizes 7. We refer to as the least-squares polynomial approxima- 

tion of f{x) with respect to Pq{x\ Pi(x), . . . , 

If {/?„(x)}“=o is a sequence of orthogonal polynomials, then p^ix) in 
(10.40) represents a partial sum of the infinité sériés E“=o [(/'jPn)o>/ 
||/?„|li]/?„(x). This sériés may fail to converge point by point to /(x). It 
converges, however, to f{x) in the norm IHL- This is shown in the next 
theorem. 


Theorem 10.7.1. If f R is continuons, then 

/V(^) -pt{x)Ÿw{x)dx^Q 

"'a 

as 7î ^ 00, where p*(x) is defined by formula (10.40). 


Proof By the Weierstrass theorem (Theorem 9.1.1), there exists a polyno- 
mial hj^x) of degree n that converges uniformly to /(x) on [a, h\ that is, 

sup |/(x) — è„(x) I ^ 0 as/î^oo. 

a <x<b 


Hence, 


C\f{x) -b„{x)\^w{x) dx^Q 

''a 


as n ^ 00, smce 


r\f(x) -b„{x)\^w{x)dx< sup \f(x) -b„{x)N‘’w{x) dx. 

''a n<Y<h ''a 


a <x<b 



ORTHOGONAL POLYNOMIALS DEFINED ON A FINITE SET 


455 


Furthermore, 



-p*{x)\^w{x) dx< hf{x) -b„{x)fw{x) dx, (10.41) 

''a 


since, by définition, is the least-squares polynomial approximation of 

/(x). From inequality (10.41) we conclude that Wf — Pn IL ^ 0 as n ^ œ. □ 


10.8. ORTHOGONAL POLYNOMIALS DEFINED ON A FINITE SET 

In this section, we consider polynomials, Pq(x), Pi(x ), . . . , p„(x), defined on a 
finite set D = {xg, X2, . . . , x„} such that a <Xi<b, / = 0, 1, . . . , n. These poly- 
nomials are orthogonal with respect to a weight function w'*'(x) over D if 


n 

Yj =0. mi=v\ 

i = 0 

Such polynomials are said to be orthogonal of the discrète type. For example, 
the set of discrète Chebyshev polynomials, which are defined 

over the set of integers 7 = 0, 1 , . . . , n, are orthogonal with respect to w*(;) = 
1, 7 = 0, 1, 2, ... , 7î, and are given by the following formula (see Abramowitz 
and Stegun, 1964, page 791): 


= E (-1) 

k = 0 



{i + k\ 

U J 

\ k 


j\{n-k)\ 


{j-k)\n\ ’ 
/ = 0 , 1 , . . . , 7 î; 7 = 0 , 1 , . . . , /î. 


(10.42) 


For example, for i = 0, 1, 2, we hâve 


tQ{i,n) = 1 , 


7 = 0, 1,2 ,..., n, 


^i(y>«) = l 7 = 0, 

n 


= 1 

n 


J 


1(7-1) 

7Î — 1 


n \ n — 1 j 


= 1 - 


07 ( ^ -7 ' 

n \ 7î — L 


7 = 0,1,2,...,7î. 



456 


ORTHOGONAL POLYNOMIALS 


A récurrence relation for the discrète Chebyshev polynomials is of the form 

(i+ l)(n - = (2/ + l){n -2j)ti{j,n) -i{n + i+ 

/ = 1,2, . . . , 7î. (10.43) 


10.9. APPLICATIONS IN STATISTICS 

Orthogonal polynomials play an important rôle in approximating distribution 
functions of certain random variables. In this section, we consider only 
univariate distributions. 


10.9.1. Applications of Hermite Polynomials 

Hermite polynomials provide a convenient tool for approximating density 
functions and quantiles of distributions using convergent sériés. They are 
associated with the normal distribution, and it is therefore not surprising that 
they corne up in varions investigations in statistics and probability theory. 
Here are some examples. 

10.9.1.1. Approximation of Density Functions and Quantiles of Distributions 

Let dénoté the density function of the standard normal distribution, 

that is. 


<j>{x) = 



— 00 <X < 00. 


(10.44) 


We recall from Section 10.5 that the sequence {^^„(x)}“^o of Hermite 
polynomials is orthogonal with respect to w(x) = e~^ and that 





2 - 1 ! 




n — 2 


+ 


2 ^- 2 ! 


23 - 3 ! 



(10.45) 


where = nin — l)(n — 2) (n — r + 1). Suppose now that g(x) is a den- 
sity function for some continuons distribution. We can represent g(x) as a 
sériés of the form 


8(x) 


X bnH^{x)(j){x), 


« = 0 


where, as in formula (10.32), 



1 .CO 

j g{x)H^{x)dx. 


ni - 


(10.46) 


(10.47) 



APPLICATIONS IN STATISTICS 


457 


By substituting as given by formula (10.45), in formula (10.47), we 

obtain an expression for in terms of the central moments, /jlq, ijl^, . . . , 
/x„,..., of the distribution whose density function is g(x). These moments 
are defined as 


l^n= [ {x - iiY g{x) dx , n = 0,l,2,..., 

where /x is the mean of the distribution. Note that /Xq =1, /x^ = 0, and 
/X2 = o-^, the variance of the distribution. In particular, if /x = 0, then 

^0=1, 

^1=0, 

^2 ^ A^2 “ 1) ? 

^3 “ 6/^3? 

^4= à( /^4“6/X2 + 3), 

^5 “ m( A^5 ~ 10/^3) ’ 

^6 “ 7^( A^6 ~ I5/X4 + 45 /X2 — 15) , 


The expression for g(x) in formula (10.46) can then be written as 

g(x) = <^>(x) [1 + 2( /^2 ~ 1)^2(^) 6/^3^3(^) 

+ à:( /^4 “ 6/X2 + 3)//4 (x) + •••]. (10.48) 


This expression is known as the Gram-Charlier sériés oftype A. 

Thus the Gram-Charlier sériés provides an expansion of g(x) in terms of 
its central moments, the standard normal density, and Hermite polynomials. 
Using formulas (10.21) and (10.46), we note that g(x) can be expressed as a 
sériés of dérivatives of 4>{x) of the form 


S{x) 




(10.49) 


where 



458 


ORTHOGONAL POLYNOMIALS 


Cramér (1946, page 223) gave conditions for the convergence of the sériés 
on the right-hand side of formula (10.49), namely, if g(x) is continuons and 
of bounded variation on ( — and if the intégral /“^g(x)exp(x^/4) A: is 
convergent, then the sériés in formula (10.49) will converge for every x to 
g(x). 

We can utilize Gram-Charlier sériés to find the upper a-quantile, of 
the distribution with the density fonction g(x). This point is defined as 


r^a 

/ g(x)dx 

— 00 



From (10.46) we hâve that 

CO 

g(x) = cl){x)+ T,bnH^{x)(}>{x). 

n = 2 


Then 


CO 

XX X 

( g{x)dx= f (f){x)dx-\- H^{x)(f){x) dx. (10.50) 

•'-CO •'-CO 


However, 



H^{x)^{x)dx= -//„_i(x,)</)(xj. 


To prove this equality we note that by formula (10.21) 


I d x"" 

j “fin(x)4>(x)dx = (-!)" j “\^ — j (t>{x)dx 


= (-l) 


n 


d 

dx 


n — l 




where (d/dxY dénotés the value of the (n — l)st dérivative of </>(x) 

at x^. By applying formula (10.21) again we obtain 


— CO 


By making the substitution in formula (10.50), we get 

00 

f “ g{x) dx = f “ (f>{x) ck - t>„fin-iiXa)HXa)- (10.51) 

-00 - 00 



APPLICATIONS IN STATISTICS 


459 


Now, suppose that is the upper a-quantile of the standard normal 
distribution. Then 


f g(x) dx= 1 — a = ( 4>(x)dx. 

— 00 — CO 


Using the expansion (10.51), we obtain 


CO 



(10.52) 


If we expand the right-hand side of équation (10.52) using Taylor’s sériés in a 
neighborhood of we get 


CO 

/ ^a 

(p{x)dx= / (f){x)dx-\- 

-CO •'—CO .-_1 


7 = 1 


dx j 


00 

/ ^a 

(l>(x) dx+ 

j=i 


y! 


(Zg-X,) 

y! 


y-i 


HXg) 


(-1)^ 


CO 

= f “ (l)(x)dx- L 
— 00 _ 1 


(Xg-Z,) 


/I 

j=l J' 


using formula (10.21) 


H.^(xJcf,(xJ. (10.53) 


From formulas (10.52) and (10.53) we conclude that 


CO °° ( X Z 

L KHn-l{Xg)<^>{x^) = Y, “ 


n = 2 


7 = 1 


y! 


By dividing both sides by c^>(x^) we obtain 


CO 00 / ^ 

i: KH,-i{xg) = i: " (10.54) 

n=2 j=l 7 - 

This provides a relationship between x^, the a-quantile of the distribution 
with the density function g(x), and the corresponding quantile for the 
standard normal. Since the è„’s are functions of the moments associated with 
g(x), then it is possible to use (10.54) to express in terms of and the 
moments of g(x). This was carried out by Cornish and Fisher (1937). They 
provided an expansion for x„ in terms of and the cumulants (instead 
of the moments) associated with g(x). (See Section 5.6.2 for a définition of 
cumulants. Note that there is a one-to-one correspondence between mo- 
ments and cumulants.) Such an expansion became known as the 



460 


ORTHOGONAL POLYNOMIALS 


Comish-Fisher expansion. It is reported in Johnson and Kotz (1970, page 34) 
(see Exercise 10.11). See also Kendall and Stuart (1977, pages 175-178). 


10.9.1.2. Approximation of a Normal Intégral 
A convergent sériés representing the intégral 


ijj(x) = 



rX -y 

( dt 


was derived by Kerridge and Cook (1976). Their method is based on the fact 
that 



(2n + 1)! 




(10.55) 


for any fonction /(O with a suitably convergent Taylor’s expansion in a 
neighborhood of x/2, namely. 


CO 


m = i: 

n = 0 


1 


n\ 


2j 


X \ 


/“I 


(10.56) 


Formula (10.55) results from integrating (10.56) with respect to t from 0 to x 
and noting that the even terms vanish. Taking f{t) = e~^ we obtain 



(2n + 1)! 


(10.57) 


Using the Rodrigues formula for Hermite polynomials [formula (10.21)], we 
get 



dx^ 




n = 0, 1, . . . . 


By making the substitution in (10.57), we find 



(x/2)""^^ 
(2n+ 1)! 




CO 



/î = 0 


{x/2Ÿ'‘^^ 

(2n + 1)! 




(10.58) 


This expression can be simplified by letting 0„(x) =x"//„(x)//r! in (10.58), 
which gives 



®2„(V2) 


2n + 1 



APPLICATIONS IN STATISTICS 


461 


Hence, 




2n + l 


(10.59) 


Note that on the basis of formula (10.25), the récurrence relation for B„(x) is 
given by 



The 0„(x)’s are easier to handle numerically than the Hermite polynomials, 
as they remain relatively small, even for large n. Kerridge and Cook (1976) 
report that the sériés in (10.59) is accurate over a wide range of x. Divgi 
(1979), however, States that the convergence of the sériés becomes slower as 
X increases. 


10.9.1.3. Estimation of Unknown Densities 

Let X^,X 2 ,...,X„ represent a sequence of independent random variables 
with a common, but unknown, density function /(x) assumed to be square 
intégrable. From (10.31) we hâve the représentation 

CO 

/(^)= 

7 = 0 

or equivalently, 

CO 

/(^)= (10.60) 

7 = 0 

where h^{x) is the so-called normalized Hermite polynomial of degree j, 
namely, 

hj(x) = 7 = 0,1,..., 

and üj = fZa,f(x)hj(x)dx, since fZœhj(x)dx= 1 by virtue of (10.30). 
Schwartz (1967) considered an estimate of f(x) of the form 

fn(x)= ILâj^hjix), 

7 = 0 

where 

âjn= - £ hj{Xi,), 

n k=i 



462 


ORTHOGONAL POLYNOMIALS 


and q{n) is a suitably chosen integer dépendent on n such that q{n) = o{n) 
as 7î ^ 00 . Under these conditions, Schwartz (1967, Theorem 1) showed that 
fnix) is a consistent estimator of f{x) in the mean integrated squared error 
sense, that is, 

lim E -f{x)Y dx = 0. 

Under additional conditions on f(x), fn(^) is also consistent in the mean 
squared error sense, that is, 

lim E[f{x) -f„(x)f' = 0 

uniformly in x. 

10.9.2. Applications of Jacobi and Laguerre Polynomials 

Dasgupta (1968) presented an approximation to the distribution function of 
X = |(r + 1), where r is the sample corrélation coefficient, in terms of a beta 
density and Jacobi polynomials. Similar methods were used by Durbin and 
Watson (1951) in deriving an approximation of the distribution of a statistic 
used for testing serial corrélation in least-squares régression. 

Quadratic forms in random variables, which can often be regarded as 
having joint multivariate normal distributions, play an important rôle in 
analysis of variance and in estimation of variance components for a random 
or a mrxed model. Approximation of the distributions of such quadratic forms 
can be carried out using Laguerre polynomials (see, for example, Gurland, 
1953, and Johnson and Kotz, 1968). Tiku (1964a) developed Laguerre sériés 
expansions of the distribution fonctions of the nonnormal variance ratios 
used for testing the homogeneity of treatment means in the case of one-way 
classification for analysis of variance with nonidentical group-to-group error 
distributions that are not assumed to be normal. Tiku (1964b) also used 
Laguerre polynomials to obtain an approximation to the first négative mo- 
ment of a Poisson random variable, that is, the value of E{X~^), where X 
has the Poisson distribution. 

More recently, Schdne and Schmid (2000) made use of Laguerre polyno- 
mials to develop a sériés représentation of the joint density and the joint 
distribution of a quadratic form and a linear form in normal variables. Such a 
représentation can be used to calculate, for example, the joint density and 
the joint distribution function of the sample mean and sample variance. Note 
that for autocorrelated variables, the sample mean and sample variance are, 
in general, not independent. 

10.9.3. Calculation of Hypergeometric Probabilities Using Discrète 
Chebyshev Polynomials 

The hypergeometric distribution is a discrète distribution, somewhat related 
to the binomial distribution. Suppose, for example, we hâve a lot of M items. 



APPLICATIONS IN STATISTICS 


463 


r of which are defective and M — roî which are nondefective. Suppose that 
we choose at random m items without replacement from the lot (m < M). 
Let X be the number of defectives found. Then, the probability that X=k 
is given by 


P{X=k) 



/ M-r' 

U J 

\m — k ^ 


M] 


m j 


(10.61) 


where max(0, m— M-\-r)<k< min(m, r). A random variable with the proba- 
bility mass function (10.61) is said to hâve a hypergeometric distribution. We 
dénoté such a probability function by h{k\ m, r, M). 

There are tables for computing the probability value in (10.61) (see,for 
example, the tables given by Lieberman and Owen, 1961). There are also 
several algorithms for computing this probability. Recently, Alvo and Cabilio 
(2000) proposed to represent the hypergeometric distribution in terms of 
discrète Chebyshev polynomials, as was seen in Section 10.8. The following is 
a summary of this work: Consider the sequence {^„(^, ^)}^=o discrète 
Chebyshev polynomials defined over the set of integers Æ = 0, 1, 2, . . . , m [see 
formula (10.42)], which is given by 




L(-i) 


z = 0 




U- J 

l 


k\{m—i)\ 
{k — i) \m\ ’ 


n = 0, 1, . . . , m. 


Æ = 0, 1, . . . , m. 


(10.62) 


Let X hâve the hypergeometric distribution as in (10.61). Then according to 
Theorem 1 in Alvo and Cabilio (2000), 


m 

X) t^(k,m)h{k;m,r, M) =t^(r,M) (10.63) 

k = 0 

for ail 7î = 0, 1, . . . , m and r = 0, 1, . . . , M. Let t„ = 

^„(1, m), . . . , ^„(m, m)]', n = 0, 1, . . . , m, be the base vectors in an (m + 1)- 
dimensional Euclidean space determined from the Chebyshev polynomials. 
Let g(k) be any function defined over the set of integers, Æ = 0, l,...,m. 
Then g(k) can be expressed as 


m 

s{^)= Il gJn{k,m), k = 

n = 0 


(10.64) 


2 

where g„ = g-t„/\\t„\\ , and g = [g(0), g(l), . . . , g(nz)]'. Now, using the resuit 



464 


ORTHOGONAL POLYNOMIALS 


in (10.63), the expected value of g(X) is given by 

m 

■E[g(^)] = Y. gik)h(k;m,r,M) 

k = 0 

m m 

= Y Y 8Jn{k,m)h{k;m,r,M) 

k=Q n=0 
m m 

= Y gnY tn(k,m)h{k;m,r,M) 

n=Q 

m 

= YgJn{r,M). (10.65) 

« = 0 


This shows that the expected value of g{X) can be computed from knowl- 
edge of the coefficients g^ and the discrète Chebyshev polynomials up to 
order m, evaluated at r and M. 

In particular, if g(x) is an indicator function taking the value one at x = /: 
and the value zéro elsewhere, then 

E[g{X)]=P{X = k) 

= h{k;m,r, M ) . 


Applying the resuit in (10.65), we then obtain 

tJk,m) 

h(k;m,r,M)= ^ (10.66) 

n = 0 ||t„|| 

Because of the récurrence relation (10.43) for discrète Chebyshev polynomi- 
als, calculating the hypergeometric probability using (10.66) can be done 
simply on a computer. 


FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

Abramowitz, M., and I. A. Stegun (1964). Handbook of Mathematical Functions with 
Formulas, Graphs, and Mathematical Tables. Wiley, New York. (This useful 
volume was prepared by the National Bureau of Standards. It was edited by 
Milton Abramowitz and Irene A. Stegun.) 

Alvo, M., and P. Cabilio (2000). “Calculation of hypergeometric probabilities using 
Chebyshev polynomials.” ylmcr. Statist., 54, 141-144. 

Cheney, E. W. (1982). Introduction to Approximation Theory, 2nd ed. Chelsea, New 
York. (Least-squares polynomial approximation is discussed in Chap. 4.) 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


465 


Chihara, T. S. (1978). An Introduction to Orthogonal Polynomials. Gordon and Breach, 
New York. (This text deals with the general theory of orthogonal polynomials, 
including récurrence relations, and some particular Systems of orthogonal polyno- 
mials.) 

Cornish, E. A., and R. A. Fisher (1937). “Moments and cumulants in the spécification 
of distributions.” Rev. Internat. Statist. Inst., 5, 307-320. 

Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press, 
Princeton. (This classic book provides the mathematical foundation of statistics. 
Chap. 17 is a good source for approximation of density functions.) 

Dasgupta, P. (1968). “An approximation to the distribution of sample corrélation 
coefficient, when the population is non-normal.” Sankhyâ, Ser. B., 30, 425-428. 

Davis, P. J. (1975). Interpolation and Approximation. Dover, New York. (Chaps. 8 and 
10 discuss least-squares approximation and orthogonal polynomials.) 

Divgi, D. R. (1979). “Calculation of univariate and bivariate normal probability 
functions.” ylnn. Statist., 7, 903-910. 

Durbin, J., and G. S. Watson (1951). “Testing for serial corrélation in least-squares 
régression II.'" Biometrika, 38, 159-178. 

Freud, G. (1971). Orthogonal Polynomials. Pergamon Press, Oxford. (This book deals 
with fundamental properties of orthogonal polynomials, including Legendre, 
Chebyshev, and Jacobi polynomials. Convergence theory of sériés of orthogonal 
polynomials is discussed in Chap. 4.) 

Gurland, J. (1953). “Distributions of quadratic forms and ratios of quadratic forms.” 
Ann. Math. Statist., 24, 416-427. 

Jackson, D. (1941). Fourier Sériés and Orthogonal Polynomials. Mathematical Associa- 
tion of America. (This classic monograph provides a good coverage of orthogonal 
polynomials, including Legendre, Jacobi, Hermite, and Laguerre polynomials. 
The présentation is informative and easy to follow.) 

Johnson, N. L., and S. Kotz (1968). “Tables of distributions of positive definite 
quadratic forms in central normal variables.” Sankhyâ, Ser. B, 30, 303-314. 

Johnson, N. L., and S. Kotz (1970). Continuons Univariate Distributions — 1. Houghton 
Mifflin, Boston. (Chap. 12 contains a good discussion concerning the 
Cornish-Fisher expansion of quantités.) 

Kendall, M. G., and A. Stuart (1977). The Advanced Theory of Statistics, Vol. 1, 4th ed. 
Macmillan, New York. (This classic book provides a good source for learning 
about the Gram-Charlier sériés of Type A and the Cornish-Fisher expansion.) 

Kerridge, D. F., and G. W. Cook (1976). “Yet another sériés for the normal intégral.” 
Biometrika, 63, 401-403. 

Lieberman, G. J., and Owen, D. B. (1961). Tables of the Hypergeometric Probability 
Distribution. Stanford University Press, Palo Alto, California. 

Ralston, A., and P. Rabinowitz (1978). A First Course in Numerical Analysis. 
McGraw-Hill, New York. (Chap. 7 discusses Chebyshev polynomials of the first 
kind). 

Rivlin, T. (1990). Chebyshev Polynomials, 2nd ed. Wiley, New York. (This book gives a 
survey of the most important properties of Chebyshev polynomials.) 

Schône, A., and W. Schmid (2000). “On the joint distribution of a quadratic and a 
linear form in normal variables.”/. Mult. Analysis, 72, 163-182. 



466 


ORTHOGONAL POLYNOMIALS 


Schwartz, S. C. (1967). “Estimation of probability density by an orthogonal sériés,” 
Ann. Math. Statist., 38, 1261-1265. 

Subrahmaniam, K. (1966). “Some contributions to the theory of non-normality — I 
(univariate case).” Sankhyâ, Ser. A, 28, 389-406. 

Szegô, G. (1975). Orthogonal Polynomials, 4th ed. Amer. Math. Soc., Providence, 
Rhode Island. (This much-referenced book provides a thorough coverage of 
orthogonal polynomials.) 

Tiku, M. L. (1964a). “Approximating the general non-normal variance ratio sampling 
distributions.” Biometrika, 51, 83-95. 

Tiku, M. L. (1964b). “A note on the négative moments of a truncated Poisson 
variate.”/. Amer. Statist. Assoc., 59, 1220-1224. 

Viskov, O. V. (1992). “Some remarks on Hermite polynomials.” Theory of Probability 
and Us Applications, 36, 633-637. 


EXERCISES 


In Mathematics 


10 . 1 . Show that the sequence {1/ V^,cos x,sin x, cos2x, sin2x, . . . , cos m:, 
sin 77X, . . . } is orthonormal with respect to w(x) = 1 /tt over [ — tt, tt]. 

10 . 2 . Let {jf?„(x)}“=o be a sequence of Legendre polynomials 
(a) Use the Rodrigues formula to show that 

(i) / \ix'^pj^x)dx = 0 for m = 0, 1, . . . , 7Î — 1, 


n + 1 


(ii) j\^x’^pj^x)dx = 


2n 


-1 


7î = 0, 1,2, . . . 


2n + l\ n j 

(b) Deduce from (a) that j\^pJ^x)7T^_i{x)dx = Q, where 7T„_i(x) 
dénotés an arbitrary polynomial of degree at most equal to n — 1. 

(c) Make use of (a) and (b) to show that / \^pl{x)dx = 2/{2n + 1}, 

= 0 , 1 , . . . . 

10 . 3 . Let {r„(x)}“=o t)e a sequence of Chebyshev polynomials of the first 
kind. Let = cos[(2/ — l)7r/27î], / = 1, 2, . . . , n. 

(a) Verify that ^re zéros of L„(x), that is, r„(^^) = 0, 

t 1 , ^ tt » 

(b) Show that ^ 2 ? • • • ? ^^e simple zéros of L„(x). 

[Hint: show that T^( Ci) ^ 0 for / = 1, 2, . . . , n] 

10 . 4 . Let {//^(x)}“=o a sequence of Hermite polynomials. Show that 

dHJx) 

(a) — =7î//„_i(x), 

dx 

d^HJ^x) dH^{x) 

(b) ^ -X + nHJ^x) = 0. 


dx 


dx 



EXERCISES 


467 


10.5. Let {r„(x)}“=o {Un(x)y^=Q be sequences of Chebyshev polynomials 
of the first and second kinds, respectively, 

(a) Show that | l/„(x)| < n + 1 for — 1 <x < 1. 

[Hint: Use the représentation (10.18) and mathematical induction 
on 7î.] 

(b) Show that \dT^(x)/dx\ <n^ for ail — l<x<l, with equality 
holding only if x = + 1 (n > 2). 

(c) Show that 


>-l l/l-I» Il 

for — 1 <x < 1 and n i=0. 


10.6. Show that the Laguerre polynomial L„(x) of degree n satisfies the 
differential équation 

d^L^(x) dL^(x) 

X ;^ + (a+l-x) , ' +/rL„(x)=0. 

dx dx 


10.7. Consider the function 


H(x, t) 


1 




( 1-0 


a+ 1 


a> —1. 


Expand H(x, t) as a power sériés in t and let the coefficient of t” be 
denoted by ( — l)”g„(x)//r! so that 


H{x,t)= E —g^{x)t". 

n=0 n\ 

Show that g„(x) is identical to L„(x) for ail n, where L„(x) is the 
Laguerre polynomial of degree n. 

10.8. Find the least-squares polynomial approximation of the function /(x) 
= e^ over the interval [—1,1] by using a Legendre polynomial of 
degree not exceeding 4. 

10.9. A function /(x) defined on — 1 <x< 1 can be represented using an 
infinité sériés of Chebyshev polynomials of the first kind, namely, 



468 


ORTHOGONAL POLYNOMIALS 


where 



2 H f{x)T„{x) 

Vl 



n = 0, 1, . . . . 


This sériés converges uniformly whenever f{x) is continuons and of 
bounded variation on [—1,1]. Approximate the fonction f(x) = e^ 
using the first five terms of the above sériés. 


In Statistics 

10 . 10 . Suppose that from a certain distribution with a mean equal to zéro we 

hâve knowledge of the following central moments: /X2 = 1.0, = 

— 0.91, /X4 = 4.86, fjL^= — 12.57, /Xg = 53.22. Obtain an approximation 
for the density fonction of the distribution using Gram-Charlier sériés 
of type A. 

10 . 11 . The Cornish-Fisher expansion for the upper u-quantile of a 
certain distribution, standardized so that its mean and variance are 
equal to zéro and one, respectively, is of the following form (see 
Johnson and Kotz, 1970, page 34): 

-è(2z^ - 5 z„)k| + - 6zl + 3 )k 5 

- à(z^ - 5z^ + 2) K4 + 3^4 (I2z^ - 53z2 + 17) k| 

+ 4)(Za-10z^ + 15z„)/<6 

-llo(2z^- 1 Vz^ + 21 z„)/< 3/<5 

- 3m(3z^-24z^ + 29 z„)k^ 

+ ^(14z^-103z>107z„)k|/<4 

- ^(252z^ - 1688z^ + 1511z„)/<3' + - , 

where is the upper a-quantile of the standard normal distribution, 
and is the the rth cumulant of the distribution (r = 3, 4, . . . ). Apply 
this expansion to finding the upper 0.05-quantile of the central chi- 
squared distribution with n = 5 degrees of freedom. 

[Note: The mean and variance of a central chi-squared distribution 
with n degrees of freedom are n and 2n, respectively. Its rth cumu- 



EXERCISES 


469 


lant, denoted by k', is 


Kl = n{r-1)\2 


r-l 


r = 1,2, . . . 


Hence, the rth cumulant, k^, of the standardized chi-squared distribu- 
tion is (r = 2, 3, ...).] 

10.12. The normal intégral /o dt can be calculated from the sériés 




+ 


2^-5-2! 



2^-7-3! 



2 

(a) Use this sériés to obtain an approximate value for dt. 

(b) Redo part (a) using the sériés given by formula (10.59), that is. 



^2n(x/2) 
2/î + 1 


(c) Compare the results from (a) and (b) with regard to the number of 
terms in each sériés needed to achieve an answer correct to five 
décimal places. 

10.13. Show that the expansion given by formula (10.46) is équivalent to 
representing the density function g(x) as a sériés of the form 


CO 


gix)= L 

n = 0 


Cn d"(j){x) 
n\ dx^ 


where <^>(x) is the standard normal density function, and the c„’s are 
constant coefficients . 


10.14. Consider the random variable 


n 


w= Y.X 


i = l 


where X2 , . . . , are independent random variables from a distri- 
bution with the density function 


f{x) = c/>(x) - 


A3 d^(x) 
6 dx^ 


A4 d"^cl)(x) 
24 dx^ 


A3 d%(x) 
72 dx^ ’ 



470 


ORTHOGONAL POLYNOMIALS 


where ^(x) is the standard normal density function and the quantities 
A 3 and A 4 are, respectively, the standard measures of skewness and 
kurtosis for the distribution. Obtain the moment generating function 
of W, and compare it with the moment generating function of a 
chi-squared distribution with n degrees of freedom. (See Example 
6.9.8 in Section 6.9.3.) 

[Nlnt: Use Hermite polynomials.] 

10 . 15 . A lot of M = 10 articles contains r = 3 defectives and 7 good articles. 
Suppose that a sample of m = 4 articles is drawn from the lot without 
replacement. Let X dénoté the number of defective articles in the 
sample. Find the expected value of g(X) =X^ using formula (10.65). 



CH APTER 1 1 


Fourier Sériés 


Fourier sériés were first formalized by the French mathematician Jean- 
Baptiste Joseph Fourier (1768-1830) as a resuit of his work on solving a 
particular partial differential équation known as the heat conduction équa- 
tion. However, the actual introduction of the so-called Fourier theory was 
motivated by a problem in musical acoustics concerning vibrating strings. 
Daniel Bernoulli (1700-1782) is credited as being the first to model the 
motion of a vibrating string as a sériés of trigonométrie functions in 1748, 
twenty years before the birth of Fourier. The actual development of Fourier 
theory took place in 1807 upon Fourier’s return from Egypt, where he was a 
participant in the Egyptian campaign of 1798 under Napoléon Bonaparte. 


11.1. INTRODUCTION 

A sériés of the form 


— + [a^cosnx b^sinnx] (11.1) 

^ n = l 

is called a trigonométrie sériés. Let f(x) be a function defined and Riemann 
intégrable on the interval [— tt, tt]. By définition, the Fourier sériés associ- 
ated with f(x) is a trigonométrie sériés of the form (11.1), where and 
are given by 




1 

TT 

1 

TT 


/.TT 

/ f(x)cosnxdx, 

^ — TT 
77 

/(x)sin nxdx, 



n — 0 , 1 , 2 ,..., 


JT — 1,2,... . 


( 11 . 2 ) 

(11.3) 


471 



472 


FOURIER SERIES 


In this case, we write 

CO 

+ X! [cincoswc b^sinnx]. 

n = l 



(11.4) 


The numbers and are called the Fourier coefficients of f(x). The 
Symbol ^ is used here instead of equality because at this stage, nothing is 
known about the convergence of the sériés in (11.4) for ail x in [— tt, tt]. 
Even if the sériés converges, it may not converge to f(x). 

We can also consider the following reverse approach: if the trigonométrie 
sériés in (11.4) is uniformly convergent to f(x) on [ — tt, tt], that is. 


/(^) 



+ X! [ci^coswc-\-b^smnx], 

n = l 


(11.5) 


then and b^ are given by formulas (11.2) and (11.3). In this case, the 
dérivation of and b^ is obtained by multiplying both sides of (11.5) by 
cos wc and sin wc, respectively, followed by intégration over [—77,77]. More 
specifically, to show formula (11.2), we multiply both sides of (11.5) by cos wc. 
For 77 0, we then hâve 


a 


CO 


/(x)cos 77X = — cos m: + X! cos Axeos nx + sin Ax cos nx] A:. (11.6) 

^ k=i 


Since the sériés on the right-hand side converges uniformly, it can be 
integrated term by term (this can be easily proved by applying Theorem 6.6.1 
to the sequence whose nth term is the nth partial sum of the sériés). We then 
hâve 


/ 7T Üq çTT 

/(x)cos nxdx = ^ / cos nxdx 

— TT ^ — TT 


TT 

CO r 


+ E 

k=l L 


j a COS kx COS nxdx j b/^sin kx cos nxdx 

— TT —TT 


(11.7) 


But 


çTT 

/ COS nxdx = ^, 

^ — TT 

77 = 1, 2, ... , 

(11.8) 

.TT 

\ sin Ax COS /TXTÙ: = 0, 

— TT 


(11.9) 

f coskxcosnxdx= 

^-TT 

k ^ n, 
k = n> 1. 

(11.10) 



INTRODUCTION 


473 


From we conclude (11.2). Note that formulas (11.9) and (11.10) 

can be shown to be true by recalling the following trigonométrie identities: 

sin kx cos nx = ^{sin[(Æ + n)x\ + sin[(Æ — /r)x] }, 
cos kx cos nx = ^{cos[(Æ + n)x\ cos[(Æ — tî)x] }. 

For 7Î = 0 we obtain from (11.7), 


/ TT 

/(x) dx = a^TT. 

- TT 


Formula (11.3) for can be proved similarly. We can therefore State the 
following conclusion: a uniformly convergent trigonométrie sériés is the 
Fourier sériés of its sum. 


Note. If the sériés in (11.1) converges or diverges at a point Xq, then it 
converges or diverges at Xg + 2mT (n = 1, 2 , . . . ) due to the periodic nature of 
the sine and cosine functions. Thus, if the sériés (11.1) represents a function 
/(x) on [ — 77,77], then the sériés also represents the so-called periodic 
extension of /(x) for ail values ofx. Geometrically speaking, the periodic 
extension of /(x) is obtained by shifting the graph of /(x) on [—77,77] by 
2 t7, 4t 7, . . . to the right and to the left. For example, for — 3 t 7 <x < — 77, /(x) 
is defined by /(x + 277), and for 77 <x < 3 t7, /( x ) is defined by /(x — 277), 
etc. This defines /(x) for ail x as a periodic function with period 2 t7. 

Example 11.1.1. Let /(x) be defined on [ — 77, 77] by the formula 


0, — 77<X<0, 

/(^) = {T, o<x<7.. 


77 


Then, from (11.2) we hâve 


a., = — 


n 


1 ^77 

/ f(x)cosnxdx 


-TT 

1 


77 ^ •'0 


çTT 

/ X COS nxdx 
J f\ 


1 


77 


X sm nx 


n 


TT 


1 ^77 

/ sin nxdx 

n Jf\ 


0 nJQ 


1 


2 2 

TT n 





474 


FOURIER SERIES 


Thus, for 7î = 1,2, 


0, n even, 

^ — 2 /(t 7 ^ 7 î^), 7 î odd. 


Also, from (11.3), we get 


1 pTT 

— f(x)smwcdx 

TT J 


1 .77 

-T / xsin nxdx 

7T^ Jq 


X COS nx 


1 .77 

H / COS wcdx 

n Jr> 


lo n JQ 


1 77 „ sin nx 

— --(-1)” + ^ 

TT n n o 


(- 1 ) 


n + l 


Trn 


For 77 = 0, ÜQ is given by 


-f f{x)dx 

TT J 


1 .77 

-7 / xdx 
TT^ Jo 


The Fourier sériés of /(x) is then of the form 


12“ 1 

(2n-l) 


1 1 “ (-1) 

^cos[(2t 7 — l)x] il sin/TX 


^n = l ^ 


Example 11.1.2. Let /(x) =x^( — 7t<x < tt). Then 

1 .77 

ciq= — x^ dx 

TT J — 


2tt 


3 ’ 



CONVERGENCE OF FOURIER SERIES 


475 


= 


n 


1 rTT 

— / cos nxdx 

TT J -- 


.2 


sin nx 


Tvn 

2 


TT 


— 77 


2 çTT 

/ xsin/TXiic 


Trn J - 


TTH J - 

2xcosm: 

TTn^ 

4 cos riTT 


/ X sin nxdx 

— 77 


77 


— 77 


2 ^77 

/ cos nxdx 


Trn~ - 


4(-l) 


n 


n' 


^ = 1 , 2 ,..., 


= 


n 


1 , 

— / X sin 

TT / 


77 •'-TT 

= 0 , 


since sin nx is an odd function. Thus, the Fourier expansion of f{x) is 


77“ / cos2x cos3x 

x^ ^ 4 cos X ^ ^ ^ — 

2^ 3^ 


11.2. CONVERGENCE OF FOURIER SERIES 

In this section, we consider the conditions under which the Fourier sériés of 
/(x) converges to /(x). We shall assume that /(x) is Riemann intégrable on 
[— 77 , 77]. Hence, /^(x) is also Riemann intégrable on [—77, 77] by Corollary 
6.4.1. This condition will be satisfied if, for example, /(x) is continuons on 
[ — 77, 77], or if it has a finite number of discontinuities of the first kind (see 
Définition 3.4.2) in this interval. 

In order to study the convergence of Fourier sériés, the following lemmas 
are needed: 

Lemma 11.2.1. If /(x) is Riemann intégrable on [ — 77, 77], then 


lim / /(x)cos / txA: = 0, 

rt^CO J 

(11.11) 

-77 

lim / f(x)sin nxdx = 0. 
n^cc J 

(11.12) 



476 


FOURIER SERIES 


Proof Let ^„(x) dénoté the following partial sum of Fourier sériés of 
f(x): 


ÜQ 


^„(x) = — + [üf^coskx + bj^sinkx], 
^ k=i 


(11.13) 


where (Æ = 0, 1, . . . , n) and bj^ (Æ = 1, 2, . . . , n) are given by (11.2) and 
(11.3), respectively. Then 


f f{^)^ni^)dx=^ f f{x)dx 

— TT — TT 


n 


+ E 

k = l L ' -TT 


/ TT |-7T 

/(x)cos kxdx -\-bi^j /(x)sin kxdx 

— TT — TT 




0 


n 


+ ^ E (al + H)- 


k=l 


It can also be verified that 


f sl{x)dx= ^ + TT^ {al + bl). 

^ k=l 


Consequently, 


/ [f(x) -^n(x)Ÿ dx = I P{x)dx-2j f{x)s„{x) dx + I spx) dx 

— 77 — 77 — 77 — 77 


= f P{x)dx- 

— 77 


ira 


0 


n 


+ 77 E {al + bl) 

k = l 


It follows that 



2 


n 


+ L + 

k = l 



f f^{x)dx. 

— 77 


(11.14) 


Since the right-hand side of (11.14) exists and is independent of n, the sériés 
TTi^=i{o.\ + bl) must be convergent. This follows from applying Theorem 5.1.2 
and the fact that the sequence 

S*n = E {4 + H) 

k=\ 

is bounded and monotone increasing. But, the convergence of + bf) 

implies that lim^^^(a^ + èf) = 0 (see Resuit 5.2.1 in Chapter 5). Hence, 

lim^^^a^ = 0, lim^^^^^ = 0. □ 



CONVERGENCE OF FOURIER SERIES 


477 


Corollary 11.2.1. If is Riemann intégrable on [ — tt, 7r],then 


lim f j)x]dx = 0. 

n^cc J 


(11.15) 


Proof We hâve that 

<^>(x)sin[(/î + |)x] = 


X 


</>(x)cos- 


sin nx + 




4>{x)sm- 


COS nx. 


Let c^^(x)= </)(x)sin(x/2), (j> 2 ix)= </>(x)cos(x/2). Both c^^Cx) and </> 2 (x) are 
Riemann intégrable by Corollary 6.4.2. By applying Lemma 11.2.1 to both 
c^i(x) and (/> 2 (x), we obtain 


çTT 

lim / <^>i(x)cos = 0, 


çTT 

lim / (/> 2 (x)sin TTxrfr = 0. 

•'-TT 


(11.16) 


(11.17) 


Formula (11.15) follows from the addition of (11.16) and (11.17). 


□ 


Corollary 11.2.2. If </>(x) is Riemann intégrable on [ — tt, tt], then 


lim f (j}(x)sm[(n + ^)x] dx = 0, 
n ->C0 J 


TT 


TT 

lim f cj)(x)sm\(n-\- ^)x]dx = 0. 
^ — >co Jc\ 


n^oo ./Q 


Proof. Define the fonctions hfx) and /ï 2 (x) as 


K{x) = 


0 , 


0 <X < TT, 


(/)(x), — 77<X<0, 


M.) = {«(>• 


0 <X < TT, 
— 77<X < 0. 


Both hfx) and h 2 ix) are Riemann intégrable on [— tt, tt]. Hence, by 
Corollary 11.2.1, 

lim ( (l)(x)sm[(n-\- ^)x]dx= lim f h^(x)sm[(n j)x] dx 


TT 


C r T r T 

lim / cl)(x)sm[(n-\-^)x]dx= lim / h 2 {x)sm[(n + ^)x] dx 

= 0 . □ 


n^oo — TT 
= 0 



478 


FOURIER SERIES 


Lemma 11.2.2. 


1 ^ sin[(7î + 

h y) cos ku = — — 

2 2sin(M/2) 


Proof Let GJ<_u) be defined as 


1 


n 


G„(m) = — + X) cosÆm. 

^ k=i 


Multiplying both sides by 2sin(w/2) and using the identity 

2sin — cos ku = sin[(Æ + \)u\ — sin[(Æ — \)u \ , 


(11.18) 


we obtain 


U 


U 


n 


2sin- G„(m) =sin- + Y. {sin[(Æ + |)m] -sin[(Æ- |)m]} 

^ ^ k=l 

= sin[(/î + |)w] . 


Hence, if sinCw /2) ^ 0, then 




sin (tî + |)m 
2sin(M/2) 


□ 


Theorem 11.2.1. Let f{x) be Riemann intégrable on [ — tt, tt], and let it 
be extended periodically outside this interval. Suppose that at a point x, /(x) 
satisfies the following two conditions: 


i. Both /(x ) and /(x^) exist, where /(x ) and /(x^) are the left-sided 
and right-sided limits of /(x), and 


f{x)=\[f{x ) +/(x+)]. 


(11.19) 


ii. Both one-sided dérivatives, 


nx~) 


lim 


lim 


{x + h) -f(x^) 
h ’ 

f(x + h) -f(x-) 
h 


exist. 



CONVERGENCE OF FOURIER SERIES 


479 


Then the Fourier sériés of f(x) converges to f(x) at x, that is, 




/(^) 


if X is a point of continuity, 


|[/(x'^)+/(x )] if X is a point of discontinuity of the first kind. 


Before proving this theorem, it should be noted that if f(x) is continuons 
at X, then condition (11.19) is satisfied. If, however, x is a point of disconti- 
nuity of /(x) of the first kind, then /(x) is defined to be equal to the 
right-hand side of (11.19). Such a définition of /(x) does not affect the values 
of the Fourier coefficients, and in (11.2) and (11.3). Hence, the Fourier 
sériés of /(x) remains unchanged. 

Proof of Theorem 11.2.1. From (11.2), (11.3), and (11.13), we hâve 


1 ^TT A r 1 / 

^n(x) = — f f{t)dt+Y. - / 

ZTT — TT k=l ^ \ ~ ' 


j /(^)cos ktdt cos kx 


TT yJ - 


1 ( çTT \ 

H — / f(^t)ûnktdt sin Ax 
tt\1 


X TT 

= — / f(t) 1+ (cos kt cos kx sin kt sin kx) dt 

TTI-TT yt=l 


1 TT r ^ 

~f /(O 1"^ X)cosÆ(^— x) dt. 

TT -TT yt=l 


Using Lemma 11.2.2, ^„(x) can be written as 


^n(x)=-f f(t)l 

TTJ-tt [ 


sin[(n + j)jt-x)] 
2sin[(^ — x) /2] 


If we make the change of variable t —x = u in (11.20), we obtain 

1 .77— X sin[(7î + y)Ml 

■^n(^) ^ ~ f /(^ + tiu. 


77— X 


2sin(M/2) 


( 11 . 20 ) 


Since both f{x + u) and sin[{n+ \)u]/{2sin{u/2)] hâve period 27 t with 
respect to u, the intégral from —tt — x to tt — x has the same value as the 
one from —ir to tt. Thus, 


1 .77 sinff/î + \)u\ 

^ ~ f /(^ + du. 

^ 9Gnf7i/91 


( 11 . 21 ) 



480 


FOURIER SERIES 


We now need to show that for each x, 


lim s„{x) = ^[f{x ) +/(x+)] 


n-*oo 


Formula (11.21) can be written as 

1 ro r. , ,sin[(/r + i)r^] 

= - / /(^ + ^) - • - — du 

TT J — - 


2sin(M/2) 


1 ^ sin[(n + i)M] 

H j j{x-\-u) — ^ — du 




2sin(M/2) 


1 .0 sin[(7î + 7)Ml 

2sm{u/2) 


du 


1 0 sin[(n + i)M] 

+ /(X )— _ . , — du 

7T J - 


2sin(M/2) 


1 .77 sinff/î + j)u] 

+ -[ [f{x + u) ~/(x+)] _ . , —^du 

ttJq 


2sm{u/2) 


1 .77sinf(7î + j)u] 
TT^o 2sm{u/2) 


The first intégral in (11.22) can be expressed as 

1 .0 , , sinff/r + \)u] 

-/ [f{x + u)-f{x )\———— 
TT J- -TT 2sin(w/2) 


du 


_ j_ rO f{x + u) -f{x ) 

TT J - 


U 


2sin(u/2) 


sin[(/î + f ) m] du 


We note that the function 


f{x + u) -f{x ) 


U 


U 


2sm(u/2) 


( 11 . 22 ) 


is Riemann intégrable on [—tt, 0], and at w = 0 it has a discontinuity of the 
first kind, since 



CONVERGENCE OF FOURIER SERIES 


481 


that is, both limits are finite. Consequently, by applying Corollary 11.2.2 to 
the function {[f(x + u) —f(x~)]/u}. u/[2sm(u/2)], we get 


1 .0 sin (n + k)u 

lim-f [fix + u) -f{x-)] ^ du = 0. (11.23) 

n->co — Zsaiyu/ Z j 


We can similarly show that the third intégral in (11.22) has a limit equal to 
zéro as ^ that is, 


lim 

n^oo 


1 .77 sinf(/î + \)u 

— j [f(x u) —f(x '^)]^—. — ^ ^-^du = Q. 

J f\ 


7T •'O 


2 sin {u/2) 


(11.24) 


Furthermore, from Lemma 11.2.2, we hâve 


.0 sin[(7î + |)M] .0 

2ûn{u/2) ^ J- 


— 77 


n 


1 

COS ku 

^ k=l 


du 


TT 

Y’ 


.77 sin {u ^^u . 

Jq 2sin(M/2) ^ 


77 


0 

TT 

2 


n 


1 

— COS ku 
^ k = \ 


du 


(11.25) 


(11.26) 


From (11.22)-(11.26), we conclude that 


lim5„(x)=|[/(x )+/(x+)] 


□ 


« ->co 


Définition 11.2.1, A function f(x) is said to be piecewise continuons on 
[a, è] if it is continuons on [a, b] except for a finite number of discontinuities 
of the first kind in [a, b], and, in addition, both f(a'^) and f(b~) exist. □ 

Corollary 11.2.3. Suppose that f(x) is piecewise continuons on [ — tt, tt], 
and that it can be extended periodically outside this interval. In addition, if, 
at each interior point of [ — tt , tt], ffx'^) and f'(x~) exist and /'(—tt'^) and 
f'(TT~) exist, then at a point x, the Fourier sériés of f(x) converges to 
^[f(x~)-\-f(x^)l 

Proof This follows directly from applying Theorem 11.2.1 and the fact 
that a piecewise continuons function on[a,b] is Riemann intégrable there. 

□ 



482 


FOURIER SERIES 


Example 11.2.1. Let f(x)=x ( — tt<x< tt). The periodic extension of 
f(x) (outside the interval [ — tt, tt]) is defined everywhere. In this case, 


1 


a 


n 


= — / X co^ nxdx = Q, = 0, 1,2, . . . , 
TT J -TT 


b = 


n 


1 .77 

— X sin nxdx 

'TT J 


2(-l) 


n + 1 


n 


n — 1,2,... . 


Hence, the Fourier sériés of f{x) is 


n + 1 


” 2(-i) , 

x^ sinnx, 


n = \ 


n 


This sériés converges to x at each x in (—77,77). At x = — 77, 77 we hâve 
discontinuities of the first kind for the periodic extension of /(x). Hence, at 
X = 77, the Fourier sériés converges to 

i[/(^“) =hW+{-'^)] 

= 0 . 

Similarly, at x = — 77, the sériés converges to 

+/(-7î-'")] = è[7T+ (-tt)] 

= 0 . 

For other values of x, the sériés converges to the value of the periodic 
extension of /(x). 

Example 11.2.2. Consider the function /(x)=x^ defined in Example 
11.1.2. Its Fourier sériés is 


. ^ (-1)" 

X ^ h 4 2^ ^ — cosnx. 

3 „ = 1 n 

The periodic extension of /(x) is continuons everywhere. We can therefore 
Write 


X 


2 



CO 

+ 4E 


n = l 


(- 1 )" 



3 


COS nx. 



DIFFERENTIATION AND INTEGRATION OF FOURIER SERIES 


483 


In particular, for x= + tt, we hâve 


TT 


CO 


TT = 


+ 4E 


(-1) 


2n 


n = l 


n 


2 ’ 


TT 


2 


00 Y 

Z 


n = \ 


n 


2 • 


11.3. DIFFERENTIATION AND INTEGRATION 
OF FOURIER SERIES 

In Section 11.2 conditions were given under which a function f{x) defined on 
[—77,77] is represented as a Fourier sériés. In this section, we discuss the 
conditions under which the sériés can be differentiated or integrated term by 
term. 

Theorem 11.3.1. Let + ^« = i + ^«sin ttx] be the Fourier 

sériés of /(x). If /(x) is continuous on [ — 77, 77], /( — 77 ) =/( 7 î-), and /'(x) is 
piecewise continuous on [ — 77, 77], then 

a. at each point where /"(x) exists, /Xx) can be represented by the 
dérivative of the Fourier sériés of /(x), where différentiation is done 
term by term, that is, 

CO 

Z! [/7^„cos/7X — M^sin/Tx]; 
rt = 1 

b. the Fourier sériés of /(x) converges uniformly and absolutely to /(x) 
on [ — 77, 77]. 


Proof 

a. The Fourier sériés of /(x) converges to /(x) by Corollary 11.2.3. Thus, 

CO 

+ [a„COS/7X + è„sin77x]. 

n = l 

The periodic extension of /(x) is continuous, since /(tt) =/(— 77). 
Furthermore, the dérivative, /Xx), of /(x) satisfies the conditions of 
Corollary 11.2.3. Hence, the Fourier sériés of /'(x) converges to /'(x), 
that is, 

CO 

+ E[ COS nx + /3„ sin ttx ] , 

n = l 




(11.27) 



484 


FOURIER SERIES 


where 


«0= — 


j f'{x)dx 


TT J - 


1 


77 


= 0 , 


= — 




1 ^7T 

/ f'(x)cosnxdx 


TT J - 


1 


TT 

(- 1 ) 

n 

nb„’ 


^ fX r TT 

= —f(x)cos wc\_„-\ / /(x)sin 


n 


[/(^) +nb„ 


/3„ = 


1 ^7T 

— / /'(x)sin TTXiic 

77 / 


1 


J. ^ Il ^ U 

= —f(x)sm nx / f(x)cos nxdx 

77 


= —na^. 


n çTT 
TT J -TT 


By substituting and /3„ in (11.27), we obtain 


CO 


f'(x)= X! [nb^ cos nx — na^sinwc] 

n = l 


b. Consider the Fourier sériés of f'(x) in (11.27), where Œq = 0, a^= nb^, 
/3„ = —na^. Then, using inequality (11.14), we obtain 


CO 





f'{x)Ÿ d^‘ 


(11.28) 


Inequality (11.28) indicates that = is a convergent sériés. 

Now, let ^„(x) = üq/2 + 111=1 kx b sin kx]. Then, for 

7î > m + 1, 


\0) -S„{x)\ = 

n 

Yi COS Ax + sin Ax] 


/c = m + l 


n 

^ E 

k = m + 1 


üf^ COS Ax + sin Ax . 



DIFFERENTIATION AND INTEGRATION OF FOURIER SERIES 


485 


Note that 

1 /2 

COS Ax + sin Ax| < ) . (11.29) 

Inequality (11.29) follows from the fact that cos Ax + sin Ax is the 

dot product, u • v, of the vectors u = (a^, v = (cos Ax, sin Ax)', and 

|u • v| < ||u ||2 l|v ||2 (see Theorem 2.1.2). Hence, 

S„{x) -s„{x)\< E + 

/c=m + l 

= ^ Ua^ + H^Ÿ'\ (11.30) 

k=m + 1 

But, by the Cauchy-Schwarz inequality, 

ni r ^ 1 1^^^ r ^ "l 1/2 

E Z -p Z . 

k = m + l _k = m + l . _k = m + l 

and by (11.28), 

E (o‘k + Pk)^-f [f'(x)Ÿdx. 

k = m + l ^ 

In addition, 1/k^ is a convergent sériés. Hence, by the Cauchy 
criterion, for a given 6 > 0, there exists a positive integer N such that 
^Æ=m + i 1/^^ < if 7î >772 > A^. Hence, 

n 

|^„(x) — ^^(x) I < X! I ^os Ax + sin Ax 

k = m + 1 

<cc, ifm>77>A^, (11.31) 

where c = {(1/7 t)/2^[/'(x)]^^ü:}^/^. The double inequality (11.31) 
shows that the Fourier sériés of /(x) converges absolutely and uni- 
formly to /(x) on [ — tt, tt] by the Cauchy criterion. □ 

Note that from (11.30) we can also conclude that satisfies 

the Cauchy criterion. This sériés is therefore convergent. Furthermore, it is 
easy to see that 

00 00 

E (l«;cl + l^/)^ E [^{al + biy[^^ . 

k=l k=l 



486 


FOURIER SERIES 


This indicates that the sériés convergent by the compari- 

son test. 

Note that, in general, we should not expect that a term-by-term différenti- 
ation of a Fourier sériés of fix) will resuit in a Fourier sériés of f'(x). For 
example, for the function f(x)=x, —tt<x< tt, the Fourier sériés is (see 
Example 11.2.1) 


00 


X 


E 

n = l 


2(-l) 


n + 1 


sm nx. 


n 


Differentiating this sériés term by term, we obtain E^ = i2(— D^'^^cos / tx. 
This, however, is not the Fourier sériés of /'(x) = 1, since the Fourier sériés 
of f'(x)= 1 is just 1. Note that in this case, /(tt) #/(— tt), which violâtes 
one of the conditions in Theorem 11.3.1. 


Theorem 11.3.2. If /(x) is piecewise continuons on [ — tt, tt] and has the 
Fourier sériés 


a 


CO 




(11.32) 


n = l 


then a term-by-term intégration of this sériés gives the Fourier sériés of 
for X e [ - TT, tt], that is. 


« X , ao{7r + x) ” 

j fit)dt= ^ + E 


— TT 


n = l L 


«« . K 


sin m: (cos wc — cos utt) 

n n 


— TT <X < TT 


Furthermore, the integrated sériés converges uniformly to 


Proof. Define the function g(x) as 


g(x) = f f{t)dt-^x. (11.33) 

If /(x) is piecewise continuons on [— tt, tt], then it is Riemann intégrable 
there, and by Theorem 6.4.7, g(x) is continuons on [—77,77]. Furthermore, 
by Theorem 6.4.8, at each point where /(x) is continuons, g(x) is différen- 
tiable and 


g'{x) =f{x) - y. 


(11.34) 



DIFFERENTIATION AND INTEGRATION OF FOURIER SERIES 


487 


This implies that is piecewise continuous on [ — tt, tt]. In addition, 

g(—7r) =g(7r). To show this, we hâve from (11.33) 


g{-Tr)=j f{t)dt+^TT 

— TT ^ 


a 


0 


= 77. 


— 77 Z 


a 


0 




a 


0 


= 77, 

2 


by the définition of Thus, the fonction g(x) satisfies the conditions of 
Theorem 11.3.1. It follows that the Fourier sériés of g(x) converges uni- 
formly to g(x) on [ — 77, 77]. We therefore hâve 

CO 

+ „ cos 77X + sin Tzv] . (11 

n = l 




Moreover, by part (a) of Theorem 11.3.1, we hâve 


CO 


^ X! [n^^cosm: — n/l„sin77x]. 

n = l 


(11.36) 


Then, from (11.32), (11.34), and (11.36), we obtain 


a„=nB„, 


fl — 1 , 2 ,..., 


= —nA^, 77 = 1 , 2 , . . . . 


Substituting in (11.35), we get 


A 


CO 




« = 1 L 


K «« . 

COS nx-\ sm m: 


n 


n 


From (11.33) we then hâve 


/'■ 

— TT 


a^x Ar. ^ 

f{t) dt = + X! 


n = \ 


K , 

COS wc -\ sm m: 

n n 


(11.37) 



488 


FOURIER SERIES 


To find the value of Aq, we set x = — tt in (11.37), which gives 

b. 


a^TT Aq ” / 

0=- — + — + 2^ cos «TT 

2 2 \ n 


Hence, 


A 


0 


üqTt ^ b. 


V— 1 n 

+ L — 

«=i « 


COS riTT. 


Substituting Aq/2 in (11.37), we finally obtain 


j f{t)dt = — - — + i; 


— TT 


n = l L 




sin nx (cos nx — cos utt) 

n n 


□ 


11.4. THE FOURIER INTEGRAL 


We hâve so far considered Fourier sériés corresponding to a function defined 
on the interval [—77,77]. As was seen earlier in this chapter, if a function is 
initially defined on [ — 77, 77], we can extend its définition outside [ — 77, 77] by 
considering its periodic extension. For example, if /( — 77 ) =/('"■ X then we 
can define /(x) everywhere in ( — 00 , 00 ) by requiring that /(x + 277 ) =/(x) for 
ail X. The choice of the interval [ — 77, 77] was made mainly for convenience. 
More generally, we can now consider a function /(x) defined on the interval 
[ — c, c]. For such a function, the corresponding Fourier sériés is given by 


a 


00 r 


0 


+ E 


riTTX ) 


a^cosl 


+ b^ sim 


riTTX 


where 


= — 


(11.38) 


n = \ L 

\ C J 

\ c J j 


.c 

/ /(x)cosi 

— c 

' riTTX \ 

dx, 

\ c ] 

n = 0,1,2,..., 

(11.39) 

/ /(^)sin( 

— C ' 

' riTTX \ 

dx, 

. c ; 

77 = 1 , 2, . . . . 

(11.40) 


Now, a question arises as to what to do when we hâve a function /(x) that 
is already defined everywhere on ( — 00 , 00 ), but is not periodic. We shall show 
that, under certain conditions, such a function can be represented by an 
infinité intégral rather than by an infinité sériés. This intégral is called a 
Fourier intégral. We now show the development of such an intégral. 

Substituting the expressions for and b^ given by (11.39), (11.40) into 
(11.38), we obtain the Fourier sériés 


1 


1 


CO 


2 c^_ 


f f{t)dt+- i; I f{t) 


nir 


cos 


^ n = l •' 


(t-x) 


dt . 


(11.41) 



THE FOURIER INTEGRAL 


489 


If c is finite and f(x) satisfies the conditions of Corollary 11.2.3 on [— c, c], 
then the Fourier sériés (11.41) converges to f[/(x“) +/(x^)]. However, this 
sériés représentation of f(x) is not valid outside the interval [ — c, c] unless 
fix) is periodic with the period 2c. 

In order to provide a représentation that is valid for ail values of x when 
fix) is not periodic, we need to consider extending the sériés in (11.41) by 
letting c go to infinity, assuming that f(x) is absolutely intégrable over the 
whole real line. We now show how this can be done: 

As c ^ 00 , the first term in (11.41) goes to zéro provided that fZœfiOdt 
exists. To investigate the limit of the sériés in (11.41) as c ^ oo, we set 
= 7t/c, À2 = 27t/c, . . . , = 7î7t/c, . . . , A A„ = A^ + ^ — A^ = 7t/c, n = 

1,2,... . We can then write 


1 

c 


E / /(Ocos 

n = l -0 


r niT 


c 


(t-x) 


1 “ .c 

dt=— AA„/ f{t)œs[X^{t —x)] dt. 

n = l -c 


(11.42) 


When c is large, AA„ is small, and the right-hand side of (11.42) will be an 
approximation of the intégral 


1 

TT 




/(Ocos[A( 


t —x)] dt^ 



(11.43) 


This is the Fourier intégral of fix). Note that (11.43) can be written as 



ai A)cos Ax + bi A)sin Ax] dX, 


(11.44) 


where 



/(t)cos Xtdt, 


/(t)sin Xtdt. 


The expression in (11.44) resembles a Fourier sériés where the sum has been 
replaced by an intégral and the parameter A is used in place of the integer n. 
Moreover, a(A) and ^(A) act like Fourier coefficients. 

We now show that the Fourier intégral in (11.43) provides a représenta- 
tion for fix) provided that fix) satisfies the conditions of the next theorem. 

Theorem 11.4.1. Let fix) be piecewise continuons on every finite inter- 
val [a, b]. If fZo,\fix)\ dx exists, then at every point x( — oo<x<'^) where 



490 


FOURIER SERIES 


f'(x^) and f'(x ) exist, the Fourier intégral of f(x) converges to |[/(x ) + 
/(x"^)], that is, 

— \ if /(Ocos[A(f-x)]<if <iA = i[/(x“) +/(x+)]. 

TT •'O {'^-cc ) 


The proof of this theorem dépends on the following lemmas: 
Lemma 11.4.1. If f(x) is piecewise continuons on [a, h\ then 


rb 

lim / f (x)sin nxdx = 0, 

n->oo 

(11.45) 

lim f f (x)cos nxdx = 0. 

n^cc J ^ 

(11.46) 


Proof. Let the interval [a, h] be divided into a finite number of subinter- 
vals on each of which /(x) is continuons. Let any one of these subintervals be 
denoted by [/?, q]. To prove formula (11.45) we only need to show that 

çq 

lim / f(x)smradx = 0. (11.47) 

n^oo J P 

For this purpose, we divide the interval [p,q] into k equal subintervals using 
the partition points Xg = p, x^, X 2 , . . . , x^ = ^. We can then write the intégral 
in (11.47) as 


k — l y. ^ 

H I f (x)sin nxdx, 

i = 0 


or equivalently as 



sin wcdx + 



[/(x) —f{Xi)]sinnxdx 


/ /(x)sin /îxrfr 


k-l 

< E 

/ = 0 


f{Xi) 


cos nx^ — cos ^Xy+i 


n 


It follows that 



THE FOURIER INTEGRAL 


491 


Let M dénoté the maximum value of |/(x)| on [ /?, q\ Then 



< 


2Mk 
+ 


n 


E / -f{Xi)\dx. 

/ = 0 


(11.48) 


Furthermore, since f{x) is continuons on {p,q\ it is uniformly continuons 
there [if necessary, /(x) can be made continuons at p, q by simply using 
fip^) and f{q~) as the values of f{x) at p,q, respectively]. Hence, for a 
given 6 > 0, there exists a ô > 0 such that 


/(Xi) -/(X2) < 


€ 

2{q-p) 


(11.49) 


if \xi — X 2 I < S, where x^ and X 2 are points in [p,q]^ If k is chosen large 
enough so that — xJ < 8, and hence |x— x,| < 8 if x^ <x<x^ + i, then 
from (11.48) we obtain 



2Mk 

< + 

n 


€ 

2{q-p) 


k-\ 


E/ 

i = 0 


■i + l 



or 


since 



€ 



? 


k-\ k-l 

E / ‘^"dx= E (Xi+i -Xi) 

/ = 0 i = 0 

= q-p. 

Choosing n large enough so that 2Mk/n < e/2, we finally get 

<e. (11.50) 

Formula (11.45) follows from (11.50), since e > 0 is arbitrary. Formula (11.46) 
can be proved in a similar fashion. □ 

Lemma 11.4.2. If /(x) is piecewise continuons on [0, h] and 
exists, then 

rb sin nx tt 
lim ( f{x) rfr=-/(0+). 

n^oo •'0 X Z 




492 


FOURIER SERIES 


Proof We hâve that 


rb Sin nx rb sni nx 

f f(x) dx=f{0^)[ dx 

•^0 X •'n 


'0 ^ 

+ / sm wcax. 

Jr\ 


0 


X 


(11.51) 


But 


.b sin nx .bn sin x 

lim / dx = lim / dx 

n^oo Jq X n^<x> Jf) 




'0 X 

CO sin X TT 

dx= — 

0 X 2 


(see Gillespie, 1959, page 89). Furthermore, the function (l/x)[/(x) — /(O"^)] 
is piecewise continuous on [0, b], since /(x) is, and 

/(x) -/(O^) 

which exists. Hence, by Lemma 11.4.1, 

lim / sinnxdx = ü. 

n^oo J() X 


From (11.51) we then hâve 


.b sm nx TT 
lim / /(x) dx=—f{Q^). 

n^co Jq 


□ 


X 


Lemma 11.4.3. If /(x) is piecewise continuous on [a, b], and /'(xq), 
/'(xq) exist at Xq, a <Xq< b, then 

.b sin[/î(x -Xn)l TT ^ 

lim j f{x) dx= ~[f{xQ) +f{xt)\. 

n — ^ CO x Xq ^ 


Proof. We hâve that 




X — X 


0 


çx, sin[n(x-Xo)] 

/ /(^) dx 


X — X 


^0 


X — X 


0 


= / f{Xo~x) 

•'O 


sm nx 




X 


+ r "‘"fixQ+x) 

•'o 


sm nx 




X 



THE FOURIER INTEGRAL 


493 


Lemma 11.4.2 applies to each of the above intégrais, since the right-hand 
dérivatives of /(xq—x) and /(xq+x) at x = 0 are —/'(xq) and /'(xg), 
respectively, and both dérivatives exist. Furthermore, 


lim /(xq-x) =/(xq) 


and 


lim /(xo+x) =/(x^). 

a:^0 


It follows that 


.b sin[/î(x-Xo)l 

lim f f{x) ^dx= -[/(xo) +/(xj)]. □ 

n — ^ ^ d X Xq ^ 

Proof of Theorem llA.l. The function /(x) satisfies the conditions of 
Lemma 11.4.3 on the interval [a, h]. Hence, at any point x, a <x < b, where 
/'(xq) and /'(xq) exist, 

cb sin[A(t-x)l TT ^ 

lim j f{t) dt=—[f{x )+f{x^)]. (11.52) 

A^co t — X Z 

Let us now partition the intégral 

*' — 00 IX 


T « sin[A(?-x)] .b sin[A(f-x)] 

3 ;^ dt + jf(t) — dt 

— 00 IX d l X 


sin[A(f-x)] 


(11.53) 


From the first intégral in (11.53) we hâve 


ça sin[A(f-x)] ça 1/(01 

/ /(O dt < / -dt 

J -00 t —X -^-00 \t —X 


Since t <a and a <x, then |t — x| >x — a. Hence, 


/ -T dt< / \fit)\dt. 

J-œ \t —X X — a J -00 


(11.54) 



494 


FOURIER SERIES 


The intégral on the right-hand side of (11.54) exists because fZœ\f(0\dt 
does. Similarly, from the third intégral in (11.53) we hâve, if x <b, 




t —X 


< 


/ 1 -dt 

J h t- 


1 


X 


.CO 


< 


— f \f{f)\dt 

— Y J U 


b —X 
1 


i ^co 

<7—f 1/(0 

U X — 00 


Hence, the first and third intégrais in (11.53) are convergent. It follows that 
for any e > 0, there exists a positive number N such that if a < —N and 
b>N, then these intégrais will each be less than e/3 in absolute value. 
Furthermore, by (11.52), the absolute value of the différence between the 
second intégral in (11.53) and the value (7 t/2)[/(x“) +/(x'^)] can be made 
less then e/3, if A is chosen large enough. Consequently, the absolute value 
of the différence between the value of the intégral I and (7 t/2)[/(x“) +/(x'^)] 
will be less than e, if A is chosen large enough. Thus, 


sin[A(^— x)l 

lim f f{t) ^ dt = -[f{x )+f{x^)]. 

A^oo •' 


t —X 


(11.55) 


The expression sin[A(^ —x)]/(t —x) in (11.55) can be written as 

sin[A(?-x)] fA . . , 1 J 

— = / cos[a{t —x)\ da. 


t —X 


Formula (11.55) can then be expressed as 


t -CO 

^[/(x“) +/(x’^)] = — lim I cos[a(t —x)]da 

TT A^co J — 00 Jq 


1 


J- /*A 

= — lim / daf f(t)œs[a(t — x)] dt . (11.56) 

TT A— >oc •'Q J —oc 

The change of the order of intégration in (11.56) is valid because the 
integrand in (11.56) does not exceed |/(0I in absolute value, so that the 
intégral f(t)cos[a(t —x)]dt converges uniformly for ail a (see Carslaw, 
1930, page 199; Pinkus and Zafrany, 1997, page 187). From (11.56) we finally 
obtain 


1 ^ 
l[/(^") +/(^'')] = /(0cos[a(^-x)] dtjda. 


□ 



APPROXIMATION OF FONCTIONS BY TRIGONOMETRIC POLYNOMIALS 


495 


11.5. APPROXIMATION OF FUNCTIONS BY TRIGONOMETRIC 
POLYNOMIALS 

By a trigonométrie polynomial of the nth order it is meant an expression of 
the form 




n 

JL [^k COS kx-\- Pi^sinkx]. 
k = l 


(11.57) 


A theorem of Weierstrass States that any continuons function of period 27 t 
can be uniformly approximated by a trigonométrie polynomial of some order 
(see, for example, Tolstov, 1962, Chapter 5). Thus, for a given 6 > 0, there 
exists a trigonométrie polynomial of the form (11.57) such that 


f(x) -t„{x)\<€ 


for ail values of x. In case the Fourier sériés for f(x) is uniformly convergent, 
then t„(x) can be chosen to be equal to 5„(x), the nth partial sum of the 
Fourier sériés. However, it should be noted that t„(x) is not merely a partial 
sum of the Fourier sériés for /(x), since a continuons function may hâve a 
divergent Fourier sériés (see Jackson,1941, page 26). We now show that ^„(x) 
has a certain optimal property among ail trigonométrie polynomials of the 
same order. To demonstrate this fact, let /(x) be Riemann intégrable on 
[—TT, 77 ], and let ^„(x) be the partial sum of order n of its Fourier sériés, that 
is, sj^x) = üq/ 2 'Ll=i[ai^cos kx bf^sin kx]. Let r„(x) =/(x) — 5„(x). Then, 
from (11.2), 


/ TT ^7T 

f(x)coskxdx= / s^(x)œskxdx 

— TT — TT 


= iraj ^ , Æ = 0, 1 , . . . , /r. 


Hence, 



cos kxdx = 0 


for k <n. 


(11.58) 


We can similarly show that 



sin kxdx = 0 


for k <n. 


(11.59) 



496 


FOURIER SERIES 


Now, let w„(x) = tj^x) — where tj^x) is given by (11.57). Then 


/ [f(x) -t„(x)Ÿ dx= f [r„{x) - u„{x)Ÿ dx 

— TT — TT 


= / r^{x)dx-2j r„{x)u„{x)dx + j ul{x) dx 

— TT — 77 — 77 

= / [f(x) -s„(x)Ÿ dx + j ul(x)dx, (11.60) 

— 77 — 77 


since, by (11.58) and (11.59), 


çTT 

j r^{x)u^{x)dx = {), 

— 77 


From (11.60) it follows that 


/ [fi^) -tnix)Ÿ dx> f [f{x) -s„{x)Ÿ dx. (11.61) 

— 77 — 77 


This shows that for ail trigonométrie polynomials of order n, /7^[/(x) — 
tj^x)]^dx is minimized when t„(x) = ^„(x). 


11.5.1. ParsevaPs Theorem 

Suppose that we hâve the Fourier sériés (11.5) for the function /(x), which is 
assumed to be continuons of period 27 t. Let ^„(x) be the nth partial sum of 
the sériés. We recall from the proof of Lemma 11.2.1 that 


f [f(x) -s„(x)Ÿ dx= f P{x)dx- 

— 77 — 77 


TTÜ 


0 


n 


+ 77 X; {al + bl) 


. (11.62) 


k = l 


We also recall that for a given e > 0, there exists a trigonométrie polynomial 
^„(x) of order n such that 


fix)-t„(x)\< e. 


/ [f(x) -t„{x)Ÿ dx<2TT€\ 

— 77 


Hence, 



THE FOURIER TRANSFORM 


497 


Applying (11.61), we obtain 


/ [f(x) 

— 77 


-s„(x)Ÿdx<[ [f{x) -t„{x)Ÿ dx 

— TT 


< 2tT€^ . 


(11.63) 


Since 6> 0 is arbitrary, we may conclude from (11.62) and (11.63) that the 
limit of the right-hand side of (11.62) is zéro as n ^ oo, that is, 


1 TT ^ 

-j P{x)dx=-^+ Y.{al + bl)- 

2 k=l 


This resuit is known as ParsevaVs theorem after Marc Antoine Parseval 
(1755-1836). 


11.6. THE FOURIER TRANSFORM 


In the previous sections we discussed Fourier sériés for fonctions defined on 
a finite interval (or periodic fonctions defined on R, the set of ail real 
numbers). In this section, we study a particular transformation of fonctions 
defined on R which are not periodic. 

Let f{x) be defined on = ( — ao, ao). The Fourier transform of /(x) is a 
fonction defined on R as 


1 


.00 


F{w) = — f f{x)e-‘'^^dx, 

J _ ce 


where i the complex number ^/ — 1 , and 


(11.64) 


— y wx • • 

e = cos wx — i sin wx. 

A proper understanding of such a transformation requires some knowledge 
of complex analysis, which is beyond the scope of this book. However, due to 
the importance and prevalence of the use of this transformation in varions 
fields of science and engineering, some coverage of its properties is neces- 
sary. For this reason, we merely State some basic results and properties 
concerning this transformation. For more details, the reader is referred to 
standard books on Fourier sériés, for example, Pinkus and Zafrany (1997, 
Chapter 3), Kufner and Kadlec (1971, Chapter 8), and Weaver (1989, 
Chapter 6). 

Theorem 11.6.1. If f(x) is absolutely intégrable on R, then its Fourier 
transform F(w) exists. 



498 


FOURIER SERIES 


Theorem 11.6.2. If f(x) is piecewise continuous and absolutely inté- 
grable on R, then its Fourier transform F(w) bas the following properties: 

a. F(w) is a continuous function on R. 

b. lim^^ +^F(w) = 0. 

Note that f(x) is piecewise continuous on R if it is piecewise continuous 
on each finite interval [a, b]. 


Example 11.6.1. Let f(x) = e This function is absolutely intégrable 
on R, since 


/ CO .CO 

e~^^^dx = 2 f e~^ dx 

-CO •'O 


= 2 . 


Its Fourier transform is given by 


1 -CO 

F(w) = — / dx 

2tt J 


1 


2tt J - 


/ CO 

c“l^'(coswx — /sinwx) dx 

— CO 


1 


2tt J - 


/ 


00 


e ^^^coswxdx 


CO 


1 -CO _ 

— / e ^coswxdx. 

TT J 0 


Integrating by parts twice, it can be shown that 


F(w) = 


1 


2 \ • 


7t(1 + W ^) 


Example 11.6.2. Consider the function 


f(x) 


1 

0 


\x\<a, 
otherwise , 


where a is a finite positive number. This function is absolutely intégrable on 
R, since 



\f{x)\dx= f 

—a 



= 2a, 



THE FOURIER TRANSFORM 


499 


Its Fourier transform is given by 

F(w) = — r e-^^^dx 
27rJ-a 

1 

liriw 
sin wa 

» 

7TW 

The next theorem gives the condition that makes it possible to express the 
fonction f{x) in terms of its Fourier transform using the so-called inverse 
Fourier transform. 

Theorem 11.6.3. Let f{x) be piecewise continuons and absolutely inté- 
grable on R. Then for every point x^R where f{x~) and ffx^) exist, we 
hâve 


/ CO 

F{w)e‘'^^dw. 

— 00 


In particular, if f(x) is continuons on R, then 


f{x) = r F(w)e^'^^dw. 
— 00 


(11.65) 


By applying Theorem 11.6.3 to the fonction in Example 11.6.1, we obtain 


-U _ 


-CO É 

j-œ 7 t( 1 


.IWX 


(1 + w") 
1 


= f — 

-i-œ 7 t (1 + 

COS wx 

J-œ 7 t (1 + W^) 
2 cos wx 
TT -Iq 1 + 


dw 


(cos wx + i sin wx) dw 


dw 


dw. 


11.6.1. Fourier Transform of a Convolution 

Let f(x) and g(x) be absolutely intégrable fonctions on R. By définition, the 
fonction 

-CO 

Kx) = \ f{x-y)g{y)dy (11.66) 

is called the convolution of /(x) and g(x) and is denoted by (/ * gXx). 



500 


FOURIER SERIES 


Theorem 11.6.4. Let f(x) and g(x) be absolutely intégrable on R. Let 
F(w) and G(w) be their respective Fourier transforms. Then, the Fourier 
transform of the convolution (/* g)(x) is given by 2ttF(w)G(w). 


11.7. APPLICATIONS IN STATISTICS 

Fourier sériés hâve been used in a wide variety of areas in statistics, such as 
time sériés, stochastic processes, approximation of probability distribution 
functions, and the modeling of a periodic response variable, to name just a 
few. In addition, the methods and results of Fourier analysis hâve been 
effectively utilized in the analytic theory of probability (see, for example, 
Kawata, 1972). 


11.7.1. Applications in Time Sériés 

A time sériés is a collection of observations made sequentially in time. 
Examples of time sériés can be found in a variety of fields ranging from 
économies to engineering. Many types of time sériés occur in the physical 
sciences, particularly in meteorology, such as the study of rainfall on succes- 
sive days, as well as in marine science and geophysics. 

The stimulus for the use of Fourier methods in time sériés analysis is the 
récognition that when observing data over time, some aspects of an observed 
physical phenomenon tend to exhibit cycles or periodicities. Therefore, when 
considering a model to represent such data, it is natural to use models that 
contain sines and cosines, that is, trigonométrie models, to describe the 
behavior. Let y^, 3^2? • • • ? dénoté a time sériés consisting of n observations 
obtained over time. These observations can be represented by the trigono- 
métrie polynomial model 




m 

+ X! [a^cos + è^sin 

k = l 


t= 1,2,.. 



lirk 

^k = 


2 ^ 

^k = - Ey^cos 
n 


2 ^ 

bk = - 
n 


Æ = 0, 1, 2, . . . , m. 


k = 0,1, . . . , m, 


k= 1,2, 


where 



APPLICATIONS IN STATISTICS 


501 


The values <^ 2 , . . . , are called harmonie frequencies. This model 
provides a décomposition of the time sériés into a set of cycles based on the 
harmonie frequencies. Here, n is assumed to be odd and equal to 2m + 1, so 
that the harmonie frequencies lie in the range 0 to tt. The expressions for a^. 
(Æ = 0, 1, . . . , m) and (Æ = 1, 2, . . . , m) were obtained by treating the model 
as a linear régression model with 2m + 1 parameters and then fitting it to the 
2m + 1 observations by the method of least squares. See, for example, Fuller 
(1976, Chapter 7). 

The quantity 


h{<^k) = ^{al + bl), k=l,2,...,m, (11.67) 

represents the sum of squares associated with the frequency For k = 
1,2, . . . , m, the quantities in (11.67) define the so-called penodogram. 

If 3^2? • • • ? independently distributed as normal variâtes with zéro 

means and variances then the a^’s and 6^’s, being linear combinations of 
the y/s, will be normally distributed. They are also independent, since the 
sine and cosine functions are orthogonal. It follows that [n/(2a^)](al + 6f), 
for Æ = 1, 2, . . . , m, are distributed as independent chi-squared variâtes with 
two degrees of freedom each. The periodogram can be used to search for 
cycles or periodicities in the data. 

Much of time sériés data analysis is based on the Fourier transform and its 
efficient computation. For more details concerning Fourier analysis of time 
sériés, the reader is referred to Bloomfield (1976) and Otnes and Enochson 
(1978). 


11.7.2. Représentation of Probability Distributions 

One of the interesting applications of Fourier sériés in statistics is in 
providing a représentation that can be used to evaluate the distribution 
function of a random variable with a finite range. Woods and Posten (1977) 
introduced two such représentations by combining the concepts of Fourier 
sériés and Chebyshev polynomials of the first kind (see Section 10.4.1). These 
représentations are given by the following two theorems: 

Theorem 11.7.1. Let X be a random variable with a cumulative distribu- 
tion function F(x) defined on [0, 1]. Then, F(x) can be represented as a 
Fourier sériés of the form 


(0, X < 0, 

F(x) = I 1“ 0<x< 1, 

(l, x>l. 



502 


FOURIER SERIES 


where Arccos(2x — 1), h^ = [2/{n7r)]E[T^{X)\ and E[T^{X)] is the 
expected value of the random variable 

77(X) =cos[/r Arccos(2X- 1)], 0<X< 1. (11.68) 

Note that T^'ix) is basically a Chebyshev polynomial of the first kind and 
of the /îth degree defined on [0, 1]. 

Proof See Theorem 1 in Woods and Posten (1977). □ 

The second représentation theorem is similar to Theorem 11.7.1, except 
that X is now assumed to be a random variable over [—1, 1]. 


Theorem 11.7.2. Let X be a random variable with a cumulative distribu- 
tion fonction F{x) defined on [ — 1, 1]. Then 

(0, X < — 1, 

F(x) = I 1“ E“=i^„sin7î^, -l<x< 1, 

il, x>l, 

where ^ = Arccosx, = [2/(mr)]E[Tj^X)], E[TJ^X)] is the expected value 
of the random variable 

r„(X) =cos[/î ArccosX], —1 <X< 1, 

and r„(x) is Chebyshev polynomial of the first kind and the nth degree [see 
formula (10.12)]. 


Proof. See Theorem 2 in Woods and Posten (1977). □ 


To evaluate the Fourier sériés représentation of F{x), we must first 
compute the coefficients For example, in Theorem 11.7.2, = 

[2/(n7T)]E[T^(X)]. Since the Chebyshev polynomial T„(x) can be written in 
the form 






k 

? 


yt = 0 



? 


the computation of b^ is équivalent to evaluating 

2 

bn (^nk Pk ’ ^ 1 ? ^ ? * * * ? 

k = 0 

where fj!}^ =E{X^) is the kth noncentral moment of X. The coefficients 
can be obtained by using the récurrence relation (10.16), that is. 


n = 1,2, 


? 


Tn + i(x) =2xT„{x) -T„_^{x), 


» » » 



APPLICATIONS IN STATISTICS 


503 


with TqCx) = 1, T^{x) =x. This allows us to evaluate the recursively. The 
sériés 


F(x) = l- 


0 

TT 


n = l 


is then truncated at n =N. Thus 


0 N 

F(x)-1 bf^sinkO, 

TT 7 . _ 1 


Several values of N can be tried to détermine the sensitivity of the approxi- 
mation. We note that this sériés expansion provides an approximation of 
F(x) in terms of the noncentral moments of X. Good estimâtes of these 
moments should therefore be available. 

It is also possible to extend the applications of Theorems 11.7.1 and 11.7.2 
to a random variable X with an infinité range provided that there exists a 
transformation which transforms X to a random variable Y over [0, 1], or 
over [ — 1, 1], such that the moments of Y are known from the moments of X. 

In another application, Fettis (1976) developed a Fourier sériés expansion 
for Pearson Type IV distributions. These are density functions, f(x), that 
satisfy the differential équation 


df{x) 

dx 



~{x + a) 


where a,Co,c^, and C 2 are constants determined from the central moments 
fjL^, 1 ^ 2 , /X 3 , and /X 4 , which can be estimated from the raw data. The data are 
standardized so that = 0, /X 2 = 1. This results in the following expressions 
for a, Cq, Cp and C 2 : 


2a- 1 

2(a+ 1) ’ 

2 

1 ) 

a + 1 
1 


2(a+ 1) ’ 



504 


FOURIER SERIES 


where 


3( /X4 /X3 1) 

a = ÿ . 

2/^4 — 3 /X3 — 6 

Fettis (1976) provided additional details that explain how to approximate the 
cumulative distribution function, 

— CO 


using Fourier sériés. 


11.7.3. Régression Modeling 

In régression analysis and response surface methodology, it is quite common 
to use polynomial models to approximate the mean 77 of a response variable. 
There are, however, situations in which polynomial models are not adéquate 
représentatives of the mean response, as when 77 is known to be a periodic 
function. In this case, it is more appropriate to use an approximating function 
which is itself periodic. 

Kupper (1972) proposed using partial sums of Fourier sériés as possible 
models to approximate the mean response. Consider the following trigono- 
métrie polynomial of order d. 


d 

77 = û^o + E [a„ cos/7(/)+ /3„ sin/7(/)], (11.69) 

u = \ 

where 0 < c^< 27 t represents either a variable taking values on the real line 
between 0 and 27 t, or the angle associated with the polar coordinates of a 
point on the unit circle. Let u = (w^ ^ 3 )', where = cos c^, U 2 = sin 4>. Then, 
when d = 2, the model in (11.69) can be written as 

77 = o;q + + P 1 U 2 + a 2 uf — « 2^2 (11.70) 

since sin2<^> = 2sin cj) cos = 2 u^U 2 , and cos2(/> = cos^ (/> — sin^ (j) = uj — 

One of the objectives of response surface methodology is the détermina- 
tion of optimum settings of the modehs control variables that resuit in a 
maximum (or minimum) predicted response. The predicted response y at a 
point provides an estimate of 77 in (11.69) and is obtained by replacing 
« 0 , /3„ in (11.69) by their least-squares estimâtes o:„, and /3„, respec- 

tively, 77 = 1, 2, . . . , For example, ii d = 2, we hâve 

2 

y = «o+ X! [ «„ cos 77</) + sin 77</)] , 

n = l 


(11.71) 



APPLICATIONS IN STATISTICS 


505 


which can be expressed using (11.70) as 

y = âg + u'î) + u^Bu, 

where b = (â^ and 

02 -^2 

with u'u=l. The method of Lagrange multipliers can then be used to 
détermine the stationary points of y subject to the constraint u'u = 1. Details 
of this procedure are given in Kupper (1972, Section 3). In a follow-up paper, 
Kupper (1973) presented some results on the construction of optimal designs 
for model (11.69). 

More recently, Anderson-Cook (2000) used model (11.71) in experimental 
situations involving cylindrical data. For such data, it is of interest to model 
the relationship between two correlated components, one a standard linear 
measurement y, and the other an angular measure <^. Examples of such data 
arise, for example, in biology (plant or animal migration patterns), and 
geology (direction and magnitude of magnetic fields). The fitting of model 
(11.71) is done by using the method of ordinary least squares with the 
assumption that y is normally distributed and has a constant variance. 
Anderson-Cook used an example, originally presented in Mardia and Sutton 
(1978), of a cylindrical data set in which y is température (measured in 
degrees Fahrenheit) and (/> is wind direction (measured in radians). Based on 
this example, the fitted model is 

y = 41.33 — 2.43 cos ^ — 2.60 sin <j> + 3.05 cos2c^ + 2.98sin2(/). 

A A 

The corresponding standard errors of âg, P 2 1.1896, 1.6608, 

I. 7057, 1.4029, 1.7172, respectively. Both ag and «2 are significant parame- 
ters at the 5% level, and /32 is significant at the 10% level. 

II. 7.4. The Characteristic Function 

We hâve seen that the moment generating function for a random 

variable X is used to obtain the moments of X (see Section 5.6.2 and 
Example 6.9.8). It may be recalled, however, that c^(0 may not be defined for 
ail values of t. To generate ail the moments of X, it is sufficient for to 
be defined in a neighborhood of t = 0 (see Section 5.6.2). Some well-known 
distributions do not hâve moment generating fonctions, such as the Cauchy 
distribution (see Example 6.9.1). 

Another function that générâtes the moments of a random variable in a 
manner similar to c^(0, but is defined for ail values of t and for ail random 




506 


FOURIER SERIES 


variables, is the charactenstic function. By définition, the characteristic func- 
tion of a random variable X, denoted by 4>ci0, is 

^Xt)=E[e“^] 

= / e^^^dF(x), (11.72) 

where F(x) is the cumulative distribution function of X, and i is the complex 
number ^/ — 1 . If X is discrète and has the values c^, C 2 , . . . , . . . , then 

(11.72) takes the form 

00 

U0= (11.73) 

;=i 

where p(cj) = P[X = Cj], j = l,2, ... . If X is continuous with the density 
function /(x), then 


- 

^c(0 = / e''7(x)Æc. (11-74) 

The function 4>c(t) is complex-valued in general, but is defined for ail values 
of t, since = cos tx + i sin tx, and both ^ cos txdF(x) and ^ sin txdF(x) 
exist by the fact that 



-CO 

œstx\dF(x) < / dF(x) = 1, 

— CO 


-00 CO 

/ |sin /x|<iT(x) < / dF{x) 

— CO — CO 



The characteristic function and the moment generating function, when the 
latter exists, are related according to the formula 

Furthermore, it can be shown that if X has finite moments, then they can be 
obtained by repeatedly differentiating 4>cU) and evaluating the dérivatives at 
zéro, that is. 


E(X^) 


1 d^cl^Xt) 




? 

t = 0 



Although ^^(0 generates moments, it is mainly used as a tool to dérivé 
distributions. For example, from (11.74) we note that when X is continuous, 
the characteristic function is a Fourier-type transformation of the density 



APPLICATIONS IN STATISTICS 


507 


function f(x). This follows from (11.64), the Fourier transform of /(x), which 
is given by (l/27r)fZœ dx. If we dénoté this transform by G(t) , 

then the relationship between (^>^(0 and G(t) is given by 

=2t7G(-0. 

By Theorem 11.6.3, if f(x) is continuons and absolutely intégrable on R, 
then f(x) can be derived from (^>^(0 by using formula (11.65), which can be 
written as 


f{x) = I G{t)e^^^dt 

= ^r 

— 00 


1 .CO 

/ 

J 


2tï j - 




(11.75) 


This is known as the inversion formula for characteristic functions. Thus 
the distribution of X can be uniquely determined by its characteristic 
function. There is therefore a one-to-one correspondence between distribu- 
tion functions and their corresponding characteristic functions. This provides 
a useful tool for deriving distributions of random variables that cannot be 
easily calculated, but whose characteristic functions are straightforward. 
Waller, Turnbull, and Hardin (1995) reviewed and discussed several algo- 
rithms for inverting characteristic functions, and gave several examples from 
varions areas in statistics. Waller (1995) demonstrated that characteristic 
functions provide information beyond what is given by moment generating 
functions. He pointed out that moment generating functions may be of more 
mathematical than numerical use in characterizing distributions. He used an 
example to illustrate that numerical techniques using characteristic functions 
can differentiate between two distributions, even though their moment gen- 
erating functions are very similar (see also McCullagh, 1994). 

Luceno (1997) provided further and more general arguments to show that 
characteristic functions are superior to moment generating and probability 
generating functions (see Section 5.6.2) in their numerical behavior. 

One of the principal uses of characteristic functions is in deriving limiting 
distributions. This is based on the following theorem (see, for example, 
Pfeiffer, 1990, page 426): 


Theorem 11.7.3. Consider the sequence {F„(x)}“_i of cumulative distri- 
bution functions. Let = i be the corresponding sequence of character- 

istic functions. 


a. If F„(x) converges to a distribution function F{x) at every point of 
continuity for F(x), then converges to for ail t, where (fft) 

is the characteristic function for F(x). 



508 


FOURIER SERIES 


b. If c/>„c(0 converges to (/>c(0 for ail t and is continuons at t = 0, 
then </>c(0 is the characteristic fonction for a distribution fonction F{x) 
such that converges to F{x) at each point of continuity of F(x). 


It should be noted that in Theorem 11.7.3 the condition that the limiting 
fonction is continuons at t = 0 is essential for the validity of the 

theorem. The following example shows that if this condition is violated, then 
the theorem is no longer true: 

Consider the cumulative distribution fonction 



X < —n, 

— n <x <n, 

x>n. 


The corresponding characteristic fonction is 

sin nt 

» 

nt 


As ^ cjy^^it) converges for every t to <i>ciÙ defined by 





t = 0, 
t ^ 0. 


Thus, c^>c(0 is not continuons for t = 0. We note, however, that ^ | for 
every fixed x. Hence, the limit of is not a cumulative distribution 

fonction. 


Example 11.7.1. Consider the distribution defined by the density fonc- 
tion f(x) = e~^ for X > 0. Its characteristic fonction is given by 



1 


1—it 

Example 11.7.2. Consider the Cauchy density fonction 

1 

7t(1 +X^) 


f(x) = 


— CO <x < 00^ 



APPLICATIONS IN STATISTICS 


509 


given in Example 6.9.1. The characteristic function is 



1+X^ 

COS tx 
1 +x^ 

cos tx 
1 +x^ 





sin tx 
1 +x^ 



= e 


t\ 


Note that this function is not différentiable at t = 0. We may recall from 
Example 6.9.1 that ail moments of the Cauchy distribution do not exist. 

Example 11.7.3. In Example 6.9.8 we saw that the moment generating 
function for the gamma distribution G(a, f3) with the density function 

Ig— x/ (3 

f(x) = —— — , o;>0, P>Q, 0<x<oo, 

^ T{a)l3^ ^ ^ ^ 


is (fyit) = (1 — jBt) ". Hence, its characteristic function is cj^c^t) = (1 — i[3t) 

Example 11.7.4. The characteristic function of the standard normal dis- 
tribution with the density function 


f(x) 



— ce <X < 00^ 


is 




1 


}/27T ^ 

? 

— CO 

^ 1 

^co 

2 (x -2ltx) ^ 

^|27T ^ 

— CO 

^-^V2 

r dx 

^2it 

— CO 


= e 




510 


FOURIER SERIES 


Vice versa, the density function can be retrieved from 4*c^t) by using the 
inversion formula (11.75): 


1 

f(x) = — / 

^ 27rJ-œ 


1 


2tt J - 


/ 


CO 




dt 


CO 


1 


27T J - 


j ^-^[t^ + 2t(ix) + (ixy]^(ixf /2 


e~^ .oo 1 


/ 


- kt + ix)^ 


'^2'n J -œ ^/2 


dt 


■ TT 




/ 


^/27^ ^ -ce 


e du 


TT 


-x^ /2 


n 


TT 


11.7,4.1. Some Properties of Characteristic Functions 

The book by Lukacs (1970) provides a detailed study of characteristic 
functions and their properties. Proofs of the following theorems can be found 
in Chapter 2 of that book. 

Theorem 11.7.4. Every characteristic function is uniformly continuons on 
the whole real line. 

Theorem 11.7.5. Suppose that (/>2c(0, • • • , characteristic 

functions. Let ü 2 , . . . , be nonnegative numbers such that E”=i^ï/= 1. 
Then a characteristic function. 

Theorem 11.7.6. The characteristic function of the convolution of two 
distribution functions is the product of their characteristic functions. 

Theorem 11.7.7. The product of two characteristic functions is a charac- 
teristic function. 


FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

Anderson-Cook, C. M. (2000). “A second order model for cylindrical data.” J. Statist. 
Comput. SimuL, 66, 51-65. 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


511 


Bloomfield, P. (1976). F ourier Analysis of Time Sériés: An Introduction. Wiley, New 
York. (This is an introductory text on Fourier methods written at an applied level 
for users of time sériés.) 

Carslaw, H. S. (1930). Introduction to the Theory of Fourier Sériés and Intégrais, 3rd ed. 
Dover, New York. 

Churchill, R. V. (1963). Fourier Sériés and Boundary Value Problems, 2nd ed. 
McGraw-Hill, New York. (This text provides an introductory treatment of Fourier 
sériés and their applications to boundary value problems in partial differential 
équations of engineering and physics. Fourier intégral représentations and ex- 
pansions in sériés of Bessel functions and Legendre polynomials are also treated.) 

Davis, H. F. (1963). Fourier Sériés and Orthogonal Functions. Allyn & Bacon, Boston. 

Fettis, H. E. (1976). “Fourier sériés expansions for Pearson Type IV distributions and 
probabilities.” SIAMJ. Applied Math., 31 , 511-518. 

Fuller, W. A. (1976). Introduction to Statistical Time Sériés. Wiley, New York. 

Gillespie, R. P. (1959). Intégration. Oliver and Boyd, London. 

Jackson, D. (1941). Fourier Sériés and Orthogonal Polynomials . Mathematical Associa- 
tion of America, Washington. 

Kawata, T. (1972). Fourier Analysis in Probability Theory. Academie Press, New York. 
(This text présents useful results from the théories of Fourier sériés, Fourier 
transforms, Laplace transforms, and other related topics that are pertinent to the 
study of probability theory.) 

Kufner, A., and J. Kadlec (1971). Fourier Sériés. Iliffe Books — The Butterworth 
Group, London. (This is an English translation edited by G. A. Toombs.) 

Kupper, L. L. (1972). “Fourier sériés and spherical harmonies régression.” Appl. 
Statist., 21 , 121-130. 

Kupper, L. L. (1973). “Minimax designs for Fourier sériés and spherical harmonies 
régressions: A characterization of rotatable arrangements.” J. Roy. Statist. Soc., 
Ser. B, 35 , 493-500. 

Luceho, A. (1997). “Further evidence supporting the numerical usefulness of charac- 
teristic functions.” Amer. Statist., 51 , 233-234. 

Lukacs, E. (1970). Characteristic Functions, 2nd ed. Hafner, New York. (This is a 
classic book covering many interesting details concerning characteristic functions.) 

Mardia, K. V., and T. W. Sutton (1978). “Model for cylindrical variables with 
applications.” J. Roy. Statist. Soc., Ser. B, 40 , 229-233. 

McCullagh, P. (1994). “Does the moment-generating function characterize a distribu- 
tion?” Amer. Statist., 48 , 208. 

Otnes, R. K., and L. Enochson (1978). Applied Time Sériés Analysis. Wiley, New York. 

Pfeiffer, P. E. (1990). Probability for Applications. Springer-Verlag, New York. 

Pinkus, A., and S. Zafrany (1997). Fourier Sériés and Intégral Transforms. Cambridge 
University Press, Cambridge, England. 

Tolstov, G. P. (1962). Fourier Sériés. Dover, New York. (Translated from the Russian 
by Richard A. Silverman.) 

Waller, L. A. (1995). “Does the characteristic function numerically distinguish distri- 
butions?” Amer. Statist., 49 , 150-152. 



512 


FOURIER SERIES 


Waller, L. A., B. W. Turnbull, and J. M. Hardin (1995). “Obtaining distribution 
functions by numerical inversion of characteristic functions with applications.” 
Amer. Statist., 49, 346-350. 

Weaver, H. J. (1989). Theory of Discrète and Continuons F ourier Analysis. Wiley, New 
York. 

Woods, J. D., and H. O. Posten (1977). “The use of Fourier sériés in the évaluation of 
probability distribution functions.” Commun. Statist. — Simul. Comput., 6, 
201-219. 


EXERCISES 


In Mathematics 


11.1. Expand the following functions using Fourier sériés: 

(a) f(x) = \x\, — TT <X < TT. 

(b) f(x) = Isinxl . 

(c) f(x) =X +X^, —7T<X< TT. 

11.2. Show that 

y ^ 

h (2n - 1)" 8 • 

[Hint: Use the Fourier sériés for x^.] 

11.3. Let and be the Fourier coefficients for a continuons function 
f(x) defined on [ — tt, tt] such that /(— tt) =/(7t), and f'(x) is piece- 
wise continuons on [ — tt, tt]. Show that 

(a) lim„^jM„) = 0, 

(b) \im^^SnbJ = Q. 


11.4. If f(x) is continuons on [-tt, tt], /(— tt) =/(7t), and f'(x) is piece- 
wise continuons on [ — tt, tt], then show that 


where 


f(x) -^„(x)|< 


c 



? 




n 

+ ^ COS kx bj^Ankx], 

k=l 


C 


2 


1 

TT 


f f'^(x)dx. 

— TT 


and 



EXERCISES 


513 


11 . 5 . Suppose that f(x) is piecewise continuous on [— tt, tt] and has the 
Fourier sériés given in (11.32). 

(a) Show that E“=i(— l)”^„/n is a convergent sériés. 

(b) Show that is convergent. 

[Hint: Use Theorem 11.3.2.] 

11 . 6 . Show that the trigonométrie sériés, m:)/log n, is not a Fourier 

sériés of any intégrable function. 

[Hint: If it were a Fourier sériés of a function f(x), then b^ = 1 /log n 
would be the Fourier coefficient of an odd function. Apply now part 
(b) of Exercise 11.5 and show that this assumption leads to a contra- 
diction.] 

11 . 7 . Consider the Fourier sériés of f(x) =x given in Example 11.2.1. 

(a) Show that 


m- 2 CO 

X TT 


12 


- E 


/ 1 \ ^ + 1 
( — 1) cos nx 


for — 7T<X < TT 


n = l 


n' 


[Hint: Consider the Fourier sériés of Jq f(t)dt.] 
(b) Deduce that 


CO 


i: 

n = l 


(-1)"^' 7t2 


n' 


12 


11 . 8 . Make use of the resuit in Exercise 11.7 to find the sum of the sériés 

= sin nx]/n^, 

11 . 9 . Show that the Fourier transform of f{x) = e~^ , —^<x<^,is given 
by 

1 


F(w) = 


2/^ 


,-w^/4 


[Hint: Show that F'(w) + jwF(w) = 0.] 

11 . 10 . Prove Theorem 11.6.4 using Fubini’s theorem. 


11 . 11 . Use the Fourier transform to solve the intégral équation 


/ f{x-y)f{y)dy = e-’^^/^ 

•' — CO 


for the function /(x). 



514 


FOURIER SERIES 


11 . 12 . Consider the function f(x) =x, — tt<x < tt, with /( — 77)=/(7 t) = 0, 
and f(x) 27T-periodic defined on ( — ao, ao). The Fourier sériés of f(x) 
is 


CO 


E 

n = l 


2( — 1)”^^ sin m: 


n 


Let 5„(x) be the nth partial sum of this sériés, that is, 


k+l 


A 2(-l) 

= zL ~j sinAx. 

k = l ^ 


Let = TT — 7r/n. 
(a) Show that 


^ 2sin(Æ7r/7r) 

^ni^n) = L 7 

k = l ^ 


(b) Show that 



== I.IStt. 


Note: As 7î ^ 00 , ^ TT . Hence, for n sufficiently large. 


^n(^n) ^ I-IStT- 77=0.1877. 


Thus, near x = 77 [a point of discontinuity for /(x)], the partial 
sums of the Fourier sériés exceed the value of this function by 
approximately the amount 0.18 t 7= 0.565. This illustrâtes the so- 
called Gibbs phenomenon according to which the Fourier sériés of 
/(x) “overshoots” the value of /(x) in a small neighborhood to 
the left of the point of discontinuity of /(x). It can also be shown 
that in a small neighborhood to the right of x = — 77, the Fourier 
sériés of /(x) “undershoots” the value of /(x). 


In Statistics 

11 . 13 . In the following table, two observations of the résistance in ohms are 
recorded at each of six equally spaced locations on the perimeter of a 



EXERCISES 


515 


new type of solid circular coil (see Kupper, 1972, Table 1): 

(j) (radians) Résistance (ohms) 


0 

13.62, 14.40 

7t/3 

10.552, 10.602 

277/3 

2.196, 3.696 

77 

6.39, 7.25 

4t7/3 

8.854, 10.684 

5t7/3 

5.408, 8.488 


(a) Use the method of least squares to estimate the parameters in the 
following trigonométrie polynomial of order 2: 

2 

17 = ^0 + E [ cos <^> + sin 7î , 

n = l 

where 0 < 4> <2tt, and rj dénotés the average résistance at loca- 
tion (j). 

(b) Use the prédiction équation obtained in part (a) to détermine the 
points of minimum and maximum résistance on the perimeter of 
the circular coil. 

11 . 14 . Consider the following circular data set in which cj) is wind direction 
and y is température (see Anderson-Cook, 2000, Table 1). 

4> (radians) y (°F) cj) (radians) y (°F) 


4.36 

52 

4.54 

38 

3.67 

41 

2.62 

40 

4.36 

41 

2.97 

49 

1.57 

31 

4.01 

48 

3.67 

53 

4.19 

37 

3.67 

47 

5.59 

37 

6.11 

43 

5.59 

33 

5.93 

43 

3.32 

47 

0.52 

41 

3.67 

51 

3.67 

46 

1.22 

42 

3.67 

48 

4.54 

53 

3.32 

52 

4.19 

46 

4.89 

43 

3.49 

51 

3.14 

46 

4.71 

39 


Fit a second — order trigonométrie polynomial to this data set, and 
verify that the prédiction équation is given by 

y = 41.33 — 2.43 cos </> — 2.60 sin + 3.05 cos2</> + 2.98 sin 2^. 



516 


FOURIER SERIES 


11 . 15 . Let be a sequence of independent, indentically distributed 

random variables with mean /x and variance Let 





? 


where 

(a) Find the characteristic function of s^. 

(b) Use Theorem 11.7.3 and part (a) to show that the limiting distri- 
bution of as n ^ 00 is the standard normal distribution. 

it 

Note: Part (b) represents the statement of the well-known central 
limit theorem, which asserts that for large n, the arithmetic mean 
of a sample of independent, identically distributed random 
variables is approximately normally distributed with mean /x and 
standard déviation a/ 4n . 



CH APTER 1 2 


Approximation of Intégrais 


Intégration plays an important rôle in many fields of science and engineering. 
For applications, numerical values of intégrais are often required. However, 
in many cases, the évaluation of intégrais, or quadrature, by elementary 
functions may not be feasible. Hence, approximating the value of an intégral 
in a reliable fashion is a problem of utmost importance. Numerical quadra- 
ture is in fact one of the oldest branches of mathematics: the détermination, 
approximately or exactly, of the areas of régions bounded by lines or curves, a 
subject which was studied by the ancient Babylonians (see Haber, 1970). The 
Word “quadrature” indicates the process of measuring an area inside a curve 
by finding a square having the same area. Probably no other problem has 
exercised a greater or a longer attraction than that of constructing a square 
equal in area to a given circle. Thousands of people hâve worked on this 
problem, including the ancient Egyptians as far back as 1800 B.c. 

In this chapter, we provide an exposition of methods for approximating 
intégrais, including those that are multidimensional. 


12.1. THE TRAPEZOÏDAL METHOD 

This is the simplest method of approximating an intégral of the form 
fîf(x)dx, which represents the area bounded by the curve of the function 
y =f(x) and the two lines x = a, x = b. The method is based on approximat- 
ing the curve by a sériés of straight line segments. As a resuit, the area is 
approximated with a sériés of trapezoids. For this purpose, the interval from 
a to ^ is divided into n equal parts by the partition points a = 
Xq, Xi, X 2 , . . . , x„ = Z). For the ith trapezoid, which lies between x^_i and Xp 
its width is /z = (1 /n)(b — a) and its area is given by 

-[/(x,._i) +/(x,)], i=l,2,...,n. (12.1) 


517 



518 


APPROXIMATION OF INTEGRALS 


The sum, of A 2 , . . . , provides an approximation to the intégral 
/j’/(x) dx. 


n 


s = 


n 


LA 

i = l 

h 


= 2 {[/(^o) +/(^i)] + [/(^i) +7(^2)] + 


+ [/(^„-i) +/(^«)]} 


h 

2 


n — 1 


/(Xo) +/(x„) +2 E /(X,.) 

i = î 


( 12 . 2 ) 


12.1.1. Accuracy of the Approximation 

The accuracy in the trapezoidal method dépends on the number n of 
trapezoids we take. The next theorem provides information concerning the 
error or approximation. 

Theorem 12.1.1. Suppose that f(x) has a continuous second dérivative 
on [a, b], and |/"(x) | <M 2 for ail x in [a, b]. Then 

{b-afM^ 

~ Ï2n^ ’ 

where is given by formula (12.2). 


f'f(x)dx-S, 

''a 


Proof Consider the partition points a =Xq, X 2 , . . . , x„ = è such that 
h =Xi~Xi_i = (l/n)(b — a), / = 1,2, . . . , n. The intégral of /(x) from x^_i to 
Xi is 

Ii = P f(x)dx. (12.3) 


Now, in the trapezoidal method, /(x) is approximated in the interval [x^_i 
by the right-hand side of the straight-line équation. 


Pi(x) =/(x,_i) + -JilfiXi) -/(x,_i)](x-x,._i) 

X;— X X— X;_i 

-fix,_,) + —^fix,) 


h 


X — X. 


X,_1 -X,- 




h 


X — X. 


/-I 


x,--x,_i 


/(x,), i= 1,2,. ..,/î. 



Note that p^x) is a linear Lagrange interpolating polynomial (of degree 
7î = 1) with Xi_i and x, as its points of interpolation [see formula (9.14)]. 



THE TRAPEZOÏDAL METHOD 


519 


Using Theorem 9.2.2, the error of interpolation resulting from approximating 
f(x) with Pi(x) over xJ is given by 


/(x) -p,(x) = — /"(^,)(x-x,_i)(x-x,), i=l,2,...,n, (12.4) 

where x^_i Formula (12.4) results from applying formula (9.15). 

Hence, the error of approximating I- with A- in (12.1) is 




= / ‘ [f(x) -Pi(x)] dx 




2! 

■/"' 

1 


“ 2! 

■/"' 

h 


~ 2! 

■/"' 

h 


~ 2! 

■/"' 




12 


il 

= +Xi_^Xi+X^_^)-\{Xi_^+XiŸ +Xi_^Xi 


l ^ ^ ^ » » » ^ AT » 


The total error of approximating faf(x) dx with S„ is then given by 

rh " 


(12.5) 


/ /(x)^-5„ = --i:r(^,). 

''a i=l 


It follows that 


j'’f{x)dx- 


Sn ^ 


nh^M2 


(b -a) M 2 


( 12 . 6 ) 


An alternative procedure to approximating the intégral J^f(x) rfr by a sum 
of trapezoids is to approximate f^_l^f(x)dx by a trapezoid bounded from 
above by the tangent to the curve of y =f(x) at the point + /z/2, which 
is the midpoint of the interval In this case, the area of the zth 

trapezoid is 


A*=/z/ x,_i + - , / = 1,2,...,7 î. 



520 


APPROXIMATION OF INTEGRALS 


Hence, 


Cf{x)dx=^h + - 

'’a i=l \ 


and f{x) is approximated in the interval xJ by 


(12.7) 


p*{x) =f\ x,_i + - 


h 

+ - - 


h 


By applying Taylor’s theorem (Theorem 4.3.1) to /(x) in a neighborhood of 
Xi_i +/z/2, we obtain 


f{x) =f\ ^,-1 + - 


h 

+ - - 


h ' 

fl *,-. + 7 


1 


h 


H X — X; 1 

2 ! 2 




=pf(x) + 


1 


h 


2!\^ 2 


f'ivô, 


where ^ lies between x,_i+/ï/ 2 and x. The error of approximating 
f^_‘_^f(x)dx with Af is then given by 


f ‘ [f(x) -p*(x)]dx = 

Xi-i 


< 



f'ivddx 

2 

dx 


24 


Consequently, the absolute value of the total error in this case has an upper 
bound of the form 


E / ‘ [f{x) -pfix)] dx 

i = l ^f-1 


nh^M2 

< 

24 


{b-afM^ 

2Ât? 


( 12 . 8 ) 



simpson’s method 


521 


We note that the upper bound in (12.8) is half as large as the one in (12.6). 
This alternative procedure is therefore slightly more précisé than the original 
one. Both procedures produce an approximating error that is It 

should be noted that this error does not include the roundoff errors in the 
computation of the areas of the approximating trapezoids. 

Example 12.1.1. Consider approximating the intégral f^dx/x, which has 
an exact value equal to log2 = 0.693147. Let us divide the interval [1,2] into 
= 10 subintervals of length h = Hence, Xq =1, x^ = 1.1, . . . , = 2.0. 

Using the first trapézoïdal method (formula 12.2), we obtain 


rldx 1 

Os 1 
1 


1+ - +2^ - 

h X 20 

i 2 X, 


= 0.69377. 

Using now the second trapézoïdal method [formula (12.7)], we get 

.2dx _ 1 y 1 

A X 10 + 0.05 

= 0.69284. 


12.2. SIMPSON’S METHOD 


Let us again consider the intégral f^f(x)dx. Let a =Xq <x^ < ••• <X 2 „_i < 
Xin = ^ be a sequence of equally spaced points that partition the interval 
[a, b] such that x^ —x^_^ = h, / = 1, 2, . . . , 2n. Simpson’s method is based on 
approximating the graph of the function f(x) over the interval [x/_i, X/+i] by 
a parabola which agréés with f(x) at the points x^_i, Xp and x^ + i. Thus, over 
[Xj_i,Xj + J, /(x) is approximated with a Lagrange interpolating polynomial 
of degree 2 of the form [see formula (9.14)] 


<?,(x) =/(X;_i) 


(Xi-1 


— : — ^7- — : — 7 +/(^,>i) 




-X,) 


77 ^ f(Xi) 


+ f{Xi + i) 


2/î" 

(x-x,._i)(x-x,.) 


h 


2h 


2 



522 


APPROXIMATION OF INTEGRALS 


It follows that 


f{x) dx- / qi{x) dx 


f{Xi-ù 

rXi+ 

2h^ ^ 

1 

Xi-l 

f(Xi) 

r^i + 

■ 

L 

Xi_i 

f(Xi + i 

!/■ 


r^i+\ 

/ (x — x,)(x — dx 




2h 

/(x;_i) / 2h 


Xi-l 

3 


fi Xi) 


2h 


h 


2 


^ -Ah 


3 \ 


+ 


fjXj + l) 
2h^ 


! 0/^3 \ 


2h 


\ '' / 


-[/(x,._i) +4/(x,.) +/(x,.+i)], j=l,3,...,2n-l. 


(12.9) 


By adding up ail the approximations in (12.9) for / = 1, 3, . . . , 2 tî — 1, we 
obtain 



n 


n — 1 


/(^o) +4 E/(^ 2 ,-i) +2 E fiXli) +f(X2n) 


i = l 


i = l 


( 12 . 10 ) 


As before, in the case of the trapezoidal method, the accuracy of the 
approximation in (12.10) can be figured out by using formula (9.15). Courant 
and John (1965, page 487), however, stated that the error of approximation 
can be improved by one order of magnitude by using a cubic interpolating 
polynomial which agréés with f(x) at + and whose dérivative at 

Xi is equal to /'(x^). Such a polynomial gives a better approximation to /(x) 
over [x^_i, x, + Jthan the quadratic one, and still provides the same approxi- 
mation formula (12.10) for the intégral. If qi(x) is chosen as such, then the 
error of interpolation resulting from approximating /(x) with ^/(x) over 
[x,_i,x, + J is given by /(x) - (?,(^) = (l/4!)/^'^^(^/)(x -x,_i)(x -x,)^(x - 
Xj + i), where x,_^ < ^i <x, + i, provided that f^^Kx) exists and is continuons 
on [a, b]. This is équivalent to using formula (9.15) with n = 3 and with two 
of the interpolation points coincident at x,. We then hâve 

/ ‘ V(^) -^i{x)\ dx 

Xi-l 

/=1,3, 



» » » 


,2n-l, (12.11) 



NEWTON-COTES METHOD 


523 


where is an upper bound on for a<x<b. By computing the 

integra! in (12.11) we obtain 

/ dx 

Consequently, the total error of approximation in (12.10) is less than or equal 
to 


M^h 

< ^ = l,3,...,2/î- 1. 


nM^h^ M^(b—aŸ 
90 ” 2880/r^ ’ 


( 12 . 12 ) 


since h =(b — a)/2n. Thus the error of approximation with Simpson’s method 
is 0(l/n'^), where n is half the number of subintervals into which [a, b] is 
divided. Hence, Simpson’s method yields a much more accurate approxima- 
tion than the trapezoidal method. 

As an example, let us apply Simpson’s method to the calculation of the 
intégral in Example 12.1.1 using the same division of [1,2] into 10 subinter- 
vals, each of length h = ^. By applying formula (12.10) we obtain 


r2 

0.10 

/ 

1 

1 

1 

1 

1 \ 

I — = 

5 

1 + 4 

+ 


+ + 

+ 


A X 

3 

[ l 

1.1 

1.3 

1.5 

1.7 

1.9 j 


/ 1 

1 

1 

1 ^ 

r 

I + 

+ 

+ 


+ — 

\ 1.2 

1.4 

1.6 

1.8 

2 


= 0.69315. 


12.3. NEWTON-COTES METHODS 

The trapezoidal and Simpson’s methods are two spécial cases of a general 
sériés of approximate intégration methods of the so-called Newton-Cotes 
type. In the trapezoidal method, straight line segments were used to approxi- 
mate the graph of the function fix) between a and b. In Simpson’s method, 
the approximation was carried out using a sériés of parabolas. We can refine 
this approximation even further by considering a sériés of cubic curves, 
quartic curves, and so on. For cubic approximations, four equally spaced 
points are used to subdivide each subinterval of [a, b] (instead of two points 
for the trapezoidal method and three points for Simpson’s method), whereas 
five points are needed for quartic approximation, and so on. Ail such 
approximations are of the Newton-Cotes type. 



524 


APPROXIMATION OF INTEGRALS 


12.4. GAUSSIAN QUADRATURE 


Ail Newton-Cotes methods require the use of equally spaced points, as was 
seen in the cases of the trapezoidal method and Simpson’s method. If this 
requirement is waived, then it is possible to select the points in a manner that 
reduces the approximation error. 

Let Xq <Xi <X 2 < *** <x„ be /î + 1 distinct points in [a, b]. Consider the 
approximation 

/’V(x)rfr= E w,/(X;), (12.13) 

« / = 0 


where the coefficients, coq, are to be determined along with the 

points The total number of unknown quantities in (12.13) is 

2n + 2. Hence, 2n -\-2 conditions must be specified. According to the so-called 
Gaussian intégration rule, the w/s and x/s are chosen such that the approxi- 
mation in (12.13) will be exact for ail polynomials of degrees not exceeding 
2n + 1. This is équivalent to requiring that the approximation (12.13) be 
exact for f{x) =x\ 7 = 0, 1, 2 , . . . , 2 tî + 1 , that is, 

f x^dx= J^o)-x{, 7 = 0, 1, 2, . . . ,2tî + 1. (12.14) 

i = Q 


This process produces 2n -\-2 équations to be solved for the w/s and x/s. In 
particular, if the limits of intégration are a = —1, b = 1, then it can be shown 
(see Phillips and Taylor, 1973, page 140) that the x^-values will be the n 1 
zéros of the Legendre polynomial p„+i(x) of degree n + 1 (see Section 10.2). 
The w-values can be easily found by solving the System of équations (12.14), 
which is linear in the <w/s. 

For example, for n = 1, the zéros of P 2 (x) = — 1) are Xq = — 1/^/3 , 

Xi = 1/V^. Applying (12.14), we obtain 


/_ 

/ 


1 

-1 

1 

-1 


/: 

dx — cûq + (0^ 

=> 

coq + (jû^ — 2 , 

1 

xdx = 
-1 

^0^0 

+ (O^Xi 


1 

x^ dx = 

COqXq 

+ 0 )^x 1 


|( Wq + Wi) = 

x^ dx = 

(OqXI 

+ 0 )^x 1 


3 V 3 < 


We note that the last two équations are identical to the first two. Solving the 



GAUSSIAN QUADRATURE 


525 


latter for ojq and we get Wq = = 1. Hence, we hâve the approximation 





? 


which is exact if f(x) is a polynomial of degree not exceeding 2/r + 1 = 3. 

If the limits of intégration are not equal to — 1, 1, we can easily convert the 
intégral f^f(x)dx to one with the limits —1,1 by making the change of 
variable 


2x — (a b) 

ü * 

b — a 

This converts the general intégral dx to the intégral [(b — 

a)/2]f^^g(z)dz, where 


g{z) =f 


(b — a)z ^b + a 
2 


We therefore hâve the approximation 



b-a « 

^ i = 0 


(12.15) 


where the z/s are the zéros of the Legendre polynomial p„ + i(z). 

It can be shown that (see Davis and Rabinowitz, 1975, page 75) that when 
a = — 1, b = 1, the error of approximation in (12.13) is given by 





i = 0 


ib-ar^^iin + iy.Ÿ 

(2n + 3)[(2n + 2)!]^ 


y(2n + 2) 



a < ^<b, 
(12.16) 


provided that is continuons on [a, b]. This error decreases rapidly 

as n increases. Thus this Gaussian quadrature provides a very good approxi- 
mation with a formula of the type given in (12.13). 

There are several extensions of the approximation in (12.13). These 
extensions are of the form 



(12.17) 



526 


APPROXIMATION OF INTEGRALS 


where A(x) is a particular positive weight function. As before, the coefficients 
<Wq, Wi, . . . , and the points Xq, . . . , which belong to [a,b\ are 
chosen so that (12.17) is exact for ail polynomials of degrees not exceeding 
2n + 1. The choice of the x/s dépends on the form of A(x). It can be shown 
that the values of x^ are the zéros of a polynomial of degree n + 1 belonging 
to the sequence of polynomials that are orthogonal on {a, h] with respect to 
A(x) (see Davis and Rabinowitz, 1975, page 74; Phillips and Taylor, 1973, 
page 142). For example, if a = — 1, h = 1, A(x) = (1 — x)"(l +x)^, a; > — 1, 
)S> — 1, then the x/s are the zéros of the Jacobi polynomial p^ + fXx) (see 
Section 10.3). Also, if a = — 1, h = l, A(x) = (1 — x^)“^^^, then the x/s are 
the zéros of the Chebyshev polynomial of the first kind, T„^^(x) (see Section 
10.4), and so on. For those two cases, formula (12.17) is called the 
Gauss-Jacohi quadrature formula and the Gauss-Chehyshev quadrature for- 
mula, respectively. The choice A(x) = 1 results in the original formula (12.13), 
which is now referred to as the Gauss-Legendre quadrature formula. 

Example 12.4.1. Consider the intégral j^dx/il +x), which has the exact 
value log2 = 0.69314718. Applying formula (12.15), we get 



dx 

1 +x 



1 + + 1 ) 


n 


= E 




/ = 0 


3+z.-’ 


-1 <z, < 1 , 


(12.18) 


where the z/s are the zéros of the Legendre polynomial p„ + i(z), z = 2x — 1. 

Let 7 î = 1; then p 2 iz) = — 1) with zéros equal to Zq=— 1/^3, 

Zi = 1/ . We hâve seen earlier that <Wq = 1, = 1; hence, from (12.18), we 

obtain 


1 " 1 ^ ^ 

0 1 + x i=o ^ 

43 43 

343 - 1 3^3+1 


= 0.692307691. 


Let us now use n = 2 m (12.18). Then p4^)= |(5z^ — 3z) (see Section 
10.2). Its zéros are Zq = — (f)^'^^, 2 :^ = 0, 2:3 = (|)^'^^. To find the <w/s, we 
apply formula (12.14) using a= —1, h = l, and z in place of x. For 



GAUSSIAN QUADRATURE 


527 


j = 0, 1, 2, 3, 4, 5, we hâve 


f dz — (Oq + + CÜ2 , 

•^-1 

r = CÜqZq + + ^(> 2-^2’ 

•'-1 


r dz= MqZq + + <^ 2 -^ 2 ^ 

•^-1 

f Z^ dz= 0 )qZI + W^Zi + <^ 2 -^ 2 ? 

•^-1 

r z"^ iiz = Wo-2^0 + ^1-2^1 + ^ 2 ^ 2 ^ 

•^-1 

Z^ iiz = Wo-2^0 + ^1-2^1 + <^2-2^2- 

-1 

These équations can be written as 

coq + ù)^ + CO2 ~ 2, 

(f) (“*^0 + " 2 ) = 0> 

3 / I \ 2 

5 ( C(>Q + 0 ) 2 ) — 3 , 

(f) (“*^o + " 2 ) = 0> 

9 / I \ 2 

25 ( ^0 ^2 j 5 ’ 

^(1) "" (“^0 + ^ 2 ) =0- 

The above six équations can be reduced to only three that are linearly 
independent, namely, 

cûq + ù)^ + 0)2 — 2, 

— 0 )q + CÜ2 “ 0? 

I 10 

+ <^2 “ ~9^ 

the solution of which is Wq = f , co^ = 0)2 = f . Substituting the co/s and z/s 



528 


APPROXIMATION OF INTEGRALS 


in (12.18), we obtain 

dx Wq ùl>2 

I ~ + + 

•^0 1+x 3 +Zq 3+ Zi 3 +Z 2 

5 8 5 

= 0.693121685. 

For higher values of n, the zéros of Legendre polynomials and the 
corresponding values of co^ can be found, for example, in Shoup (1984, Table 
7.5) and Krylov (1962, Appendrx A). 


12.5. APPROXIMATION OVER AN INFINITE INTERVAL 


Consider an intégral of the form f^f(x)dx, which is improper of the first 
kind (see Section 6.5). It can be approximated by using the intégral f^f(x)dx 
for a sufficiently large value of b, provided, of course, that f^f(x)dx is 
convergent. The methods discussed earlier in Sections 12.1-12.4 can then be 
applied to f^f(x)dx. 

For improper intégrais of the first kind, of the form fQ\(x)f(x) dx, 
dx, we hâve the following Gaussian approximations: 

f X(x)f(x) ck=^ ^ wJ(Xi), (12.19) 

i = 0 
n 

X{x)f{x) dx~ ù)J{Xi), (12.20) 

i = 0 



where, as before, the x/s and w/s are chosen so that (12.19) and(12.20) are 
exact for ail polynomials of degrees not exceeding 2n 1. For the weight 
function A(x) in (12.19), the choice X(x) = e~^ gives the Gauss-Laguerre 
quadrature, for which the x/s are the zéros of the Laguerre polynomial 
L„^i(x) of degree n + 1 and a = 0 (see Section 10.6). The associated error of 
approximation is given by (see Davis and Rabinowitz, 1975, page 173) 



[(» + i)T 

(2n + 2)! 




0 < ^< 00 . 


2 

Choosing A(x) = e~^ in (12.20) gives the Gauss-Hermite quadrature, and the 
corresponding x/s are the zéros of the Hermite polynomial of 

degree n + 1 (see Section 10.5). The associated error of approximation is of 
the form (see Davis and Rabinowitz, 1975, page 174) 



^y(x) dx — 




/ = 0 


(/î + 1) !v^ 

2” + i(2/r + 2)! 




— 00 < ^< 00 . 



APPROXIMATION OVER AN INFINITE INTERVAL 


529 


We can also use the Gauss-Laguerre and the Gauss -Hermite quadrature 
formulas to approximate convergent intégrais of the form fofix) dx, 
fZa.f(x)dx: 


j f(x)dx= j e ^e^f(x)dx 


0 

n 


i = 0 


f f(x)dx= f e ^^e^y(x) dx 
— 00 — 00 


n 




i = 0 


Example 12.5.1. Consider the integra! 



n 


Y (^if(Xi), 

i = 0 


( 12 . 21 ) 


where f(x) =x(l — and the x/s are the zéros of the Laguerre 

polynomial L„ + i(x). To find expressions for L„(x), n = 0, 1, 2, . . . , we can use 
the récurrence relation (10.37) with a = 0, which can be written as 


Ln + i{x)={x-n-l)L^{x) -X 


dL^jx) 

dx 


7î = 0, 1,2, . . . . (12.22) 


Recall that Lq(x) = 1. Choosing n = 1 in (12.21), we get 


(oJ{Xq) + coJ{x^). 


(12.23) 


From (12.22) we hâve 


Li(x) =(x-l)Lo(x) 

= x — 1, 

L2(x) = (x — 2)L^(x) —X 


d{x — 1) 
dx 


= (x — 2)(x— 1) — X 
= x^ — 4x + 2. 


The zéros of L 2 (x) are Xq = 2 — ^2, x^ = 2+^/2. To find ojq and 
formula (12.19) must be exact for ail polynomials of degrees not exceeding 



530 


APPROXIMATION OF INTEGRALS 


2/î + 1 = 3. This is équivalent to requiring that 


/■“ 

j 6 ^ dx — COq + 
•'O 


.00 


/ e ^xdx= ù)qXq-\- (o^Xi, 
•^0 

/ e~^x^ dx = WqXq + cü^xj, 
•^0 

/ e~^x^dx = WqXq + WiXi . 
•'O 


Only two équations are linearly independent, the solution of which is ojq = 
0.853553, co^ = 0.146447. From (12.23) we then hâve 

WqXq (O^Xi 

I ~ 5 — + ^ — 

= 1.225054. 

Let us now calculate (12.21) using n = 2,3, 4. The zéros of Laguerre 
polynomials L^(x), L^(x), L^(x), and the values of co^ for each n are shown 
in Table 12.1. These values are given in Ralston and Rabinowitz (1978, page 
106) and also in Krylov (1962, Appendix C). The corresponding approximate 
values of I are given in Table 12.1. It can be shown that the exact value of I 
is 77 V8 = 1.2337. 


Table 12.1. Zéros of Laguerre Polynomials (x^, Values of co^, 
and the Corresponding Approximate Values'* of I 


n 



I 

1 

0.585786 

0.853553 



3.414214 

0.146447 

1.225054 

2 

0.415775 

0.711093 



2.294280 

0.278518 



6.289945 

0.010389 

1.234538 

3 

0.322548 

0.603154 



1.745761 

0.357419 



4.536620 

0.038888 



9.395071 

0.000539 

1.234309 

4 

0.263560 

0.521756 



1.413403 

0.398667 



3.596426 

0.075942 



7.085810 

0.003612 



12.640801 

0.000023 

1.233793 


^See ( 12 . 21 ). 



THE METHOD OF LAPLACE 


531 


12.6. THE METHOD OF LAPLACE 


This method is used to approximate intégrais of the form 


I{X)= (12.24) 

"'a 

where A is a large positive constant, (p(x) is continuons on [a, b], and the first 
and second dérivatives of h(x) are continuons on [a, b]. The limits a and b 
may be finite or infinité. This intégral was used by Laplace in his original 
development of the central limit theorem (see Section 4.5.1). More specifi- 
cally, if Xp X 2 , . . . , . . . is a sequence of independent and identically 

distributed random variables with a common density function, then the 
density function of the sum 1 can be represented in the form 

(12.24) (see Wong, 1989, Chapter 2). 

Suppose that hix) has a single maximum in the interval [a, b] at x = t, 
a <t <b, where h'(t) = 0 and h"(t) < 0. Hence, is maximized at t for 

any A > 0. Suppose further that becomes very strongly peaked at x = t 
and decreases rapidly away from x = t on [a, as A ^ œ. In this case, the 
major portion of /(A) cornes from integrating the function over a 

small neighborhood around x = t. Under these conditions, it can be shown 
that if a <t <b, and as A ^ 00 , 


/(A) - ^(t)c^^<'> 



(12.25) 


where ^ dénotés asymptotic equality (see Section 3.3). Formula (12.25) is 
known as Laplace's approximation. 

A heuristic dérivation of (12.25) can be arrived at by replacing ^(x) and 
h{x) by the leading terms in their Taylor’s sériés expansions around x = t. 
The intégration limits are then extended to — ^ and that is. 


( dx- f ip(t)&xp 

"'a ''a 


^ 2 

kh{t) + —{x — t) h”{t) 


dx 


/ ^(Oexp 

•' — CO 


A 


\h(t) H — (x — tŸh"(t) 
2 


dx 


/ CO 

exp 

— 00 


-{x-tŸh"{t) 


= v{t)e 


Xh{t) 


— 2t7 
\h"{t) 


T 1/2 


dx (12.26) 


(12.27) 


2 

Formula (12.27) follows from (12.26) by making use of the fact that dx 

= |r(^) = {tt /2, where F(*) is the gamma function (see Example 6.9.6), or 
by simply evaluating the intégral of a normal density function. 



532 


APPROXIMATION OF INTEGRALS 


If t = a, then it can be shown that as A ^ oo, 


/(À) ~ 


— TT 


ll/2 


2kh'{a) 


(12.28) 


Rigorous proofs of (12.27) and (12.28) can be found in Wong (1989, 
Chapter 2), Copson (1965, Chapter 5), Fulks (1978, Chapter 18), and Lange 
(1999, Chapter 4). 

Example 12.6.1. Consider the gamma function 


r(7î + l)= / e dx, n> —1. (12.29) 

•^0 

Let us find an approximation for T(n + 1) when n is large an positive, but 
not necessarily an integer. Let x = nz; then (12.29) can be written as 


-00 

T(n + 1) =7î / c“”^exp[7îlog(/îz)] dz 

•^0 

00 

= n / c“”^exp[7î log n-\- n log z]dz 
•^0 

= oxp[n{— Z log z)] dz. 


(12.30) 


Let h{z)= — z + logz. Then h(z) has a unique maximum at z = 1 with 
h'(ï) = 0 and h"(ï) = — 1. Applying formula (12.27) to (12.30), we obtain 


n + l ^—n 


r( 7 r + l)-/r” + ^c 


— 27T 
n{-\) 


lV2 


= e ”7î”V2z7^, 

as 7î ^ 00 . Formula (12.31) is known as Stirling' s formula. 
Example 12.6.2. Consider the intégral. 


(12.31) 


1 r'TT 

7„(A) = — / exp(Acosx)cos/7Xiic, 
TT •'O 


as A ^ 00 . This intégral looks like (12.24) with h{x) = cos x, which has a 
single maximum at x = 0 in [0, tt]. Since /z"(0) = — 1, then by applying (12.28) 



MULTIPLE INTEGRALS 


533 


we obtain, as A ^ 


TT 


— TT 


ll/2 


2A(-1) 


v^27tA 


Example 12.6.3. Consider the intégral 



as A ^ 00 . Here, /z(x) = sin x, which has a single maximum at x= 7t/2 in 
[0, 7 t], and /z"(vr/2) = — 1. From (12.27), we get 


as A 



/(A) 



ll/2 


A(-l) 



12.7. MULTIPLE INTEGRALS 


We recall that intégration of a multivariable function was discussed in 
Section 7.9. In the présent section, we consider approximate intégration 
formulas for an n-tuple Riemann intégral over a région D in an n-dimen- 
sional Euclidean space R”. 

For example, let us consider the double intégral 1= //^/(x^, X 2 )d^i 
where D <zR^ is the région 


D = {{x„x,)\ \a <Xi<b,ijj(xi) <X 2 < (/>(xi)}. 


Then 




1=1 / f{Xi,X2)dX2 

''a 


^{x{) 


dxi 


= f’’g{xi)dx^, 

"'a 


(12.32) 


where 


’(f>ixi) 


g{x,)= (^'^^'f{x„x,)dx,. 

ipiXi) 


(12.33) 



534 


APPROXIMATION OF INTEGRALS 


Let US now apply a Gaussian intégration rule to (12.32) using the points 
= Zq, Zi, . . . , with the matching coefficients ojq, . . . , 


Thus 




É dx2- 

i = 0 ''if/iZi) 


(12.34) 


For the ith of the m 1 intégrais in (12.34) we can apply a Gaussian 
intégration rule using the points y/o^ ^ .V/n corresponding 

coefficients . . . , We then hâve 



j = 0 


Hence, 


m n 

1= L L 

/ = 0 y = 0 

This procedure can obviously be generalized to higher-order multiple inté- 
grais. More details can be found in Stroud (1971) and Davis and Rabinowitz 
(1975, Chapter 5). 

The method of Laplace in Section 12.6 can be extended to an n-dimen- 
sional intégral of the form 


/(A) = / 9(x)e^''W^/x, 

■^D 

which is a multidimensional version of the intégral in (12.24). Here, D is a 
région in R”, which may be bounded or unbounded, A is a large positive 
constant, and x = (x^ X 2 , . . . , x„)'. As before, it is assumed that: 

a. ip{ x) is continuons in D. 

b. k(x) has continuons first-order and second-order partial dérivatives with 
respect to x^, X 2 , . . . , x„ in D. 

c. h(x) has a single maximum in D at x = t. 

If t is an interior point of D, then it is also a stationary point of k(x), that 
is, ^h/é'xJjj^j = 0, / = 1, 2, . . . , 7î, since t is a point of maximum and the 



THE MONTE CARLO METHOD 


535 


partial dérivatives of h(x) exist. Furthermore, the Hessian matrix, 


H,(t) 


d^h(x) 


âX^dXj 


? 


x = t 


is négative definite. Then, for large A, /(A) is approximately equal to 

n/2 

9(t) { - det [H;,(t)] } . 

A proof of this approximation can be found in Wong (1989, Section 9.5). We 
note that this expression is a generalization of Laplace’s formula (12.25) to an 
7î-tuple Riemann intégral. 

If t happens to be on the boundary of D and still satisfies the conditions 
that dh/dXi\Ti=t = 0 for z = 1, 2, . . . , n, and H/j(t) is négative definite, then it 
can be shown that for large A, 

n/2 

which is one-half of the previous approximation for /(A) (see Wong, 1989, 
page 498). 




12.8. THE MONTE CARLO METHOD 

A new approach to approximate intégration arose in the 1940s as part of the 
Monte Carlo method of S. Ulam and J. von Neumann (Haber, 1970). The 
basic idea of the Monte Carlo method for intégrais is described as follows: 
suppose that we need to compute the intégral 

1= f^f(x)dx. (12.35) 

"'a 

We consider I as the expected value of a certain stochastic process. An 
estimate of I can be obtained by random sampling from this process, and the 
estimate is then used as an approximation to L For example, let X be a 
continuons random variable that has the uniform distribution U{a, h) over 
the interval {a, h\ The expected value of /(X) is 

E[f{X)]=-^ff{x)dx 

D Cl 

I 


b — a 



536 


APPROXIMATION OF INTEGRALS 


Let Xi, % 2 , . . . , x„ be a random sample from U{a, h). An estimate of E[f{X)] 
is given by (1 /tî)E”=i/(x,). Hence, an approximate value of /, denoted by 
can be obtained as 



b — a 


n 


n 


Lf(Xi)- 

i = l 


(12.36) 


The justification for using as an approximation to I is that is a 
consistent estimator of I, that is, for a given e > 0, 


limP(|/„-7 

n^oo '' 



This is true because (l//r)E"=i/(x^) converges in probability to E[f(X)] as 
n ^ according to the law of large numbers (see Sections 3.7 and 5.6.3). In 
other words, the probability that will be different from I can be made 
arbitrarily close to zéro if n is chosen large enough. In fact, we even hâve the 
stronger resuit that /„ converges strongly, or almost surely, to /, as n ^ by 
the strong law of large numbers (see Section 5.6.3). 

The closeness of to I dépends on the variance of which is equal to 


Var(/„) = {b — a)^ Var 



- 

^ i = l 


(b — aŸ 

? 

n 


(12.37) 


where is the variance of the random variable f(X), that is, 


o-/ = Var[/(X)] 


= E[f\X)]-{E[f{X)]\ 




I 


\2 


b — a 


(12.38) 


By the central limit theorem (see Section 4.5.1), if n is large enough, 
then is approximately normally distributed with mean I and variance 
{l/n){b — aYa^. Thus, 


A 

h-i 


d 


(^b — a) (Tf/\n 




where Z has the standard normal distribution, and the symbol 


d 


dénotés 



THE MONTE CARLO METHOD 


537 


convergence in distribution (see Section 4.5.1). It follows that for a given 
T> 0, 



A 

In-I 


< 


T 

-r(b-a)(Tf 

\n 



j e 


(12.39) 


The right-hand side of (12.39) is the probability that a standard normal 
distribution attains values between — r and r. Let us dénoté this probability 
by 1 — a. Then t = which is the upper (a/2)100th percentile of the 
standard normal distribution. If we dénoté the error of approximation, — /, 
by then formula (12.39) indicates that for large n, 


1 

Ej<^(b-a)a.z 


'f^a/2 


(12.40) 


with an approximate probability equal to 1 — a:, which is called the confi- 
dence coefficient. Thus, for a fixed a, the error bound in (12.40) is propor- 
tional to and is inversely proportional to }/n . For example, if 1 — a = 0.90, 
then z ^/2 = 1-645, and 


E 


1.645 


n 


< 


(b — a) 


Œ 


f 


with an approximate confidence coefficient equal to 0.90. Also, iîl — a = 0.95, 
then z ^/2 = 1-96, and 


E 


n 


1.96 

< — ^{b — a) ajy 


with an approximate confidence coefficient equal to 0.95. 

In order to compute the error bound in (12.40), an estimate of is 
needed. Using (12.38) and the random sample % 2 , . . . , an estimate of 
is given by 


■' n 


i = l 


- Lf(Xi) 

^ i = l 


(12.41) 


12.8.1. Variance Réduction 

In order to increase the accuracy of the Monte Carlo approximation of /, the 
error bound in (12.40) should be reduced for a fixed value of a. We can 
achieve this by increasing n. Alternatively, we can reduce oy by considering a 
distribution other than the uniform distribution. This can be accomplished by 
using the so-called method of importance sampling, a description of which 
follows. 



538 


APPROXIMATION OF INTEGRALS 


Let g(x) be a density function that is positive over the interval {a, h]. 
Thus, g(x) >0, a <x <b, and J^g(x)dx = 1. The intégral in (12.35) can be 
written as 


rb f(x) 

I=f ——g{x)dx. (12 

a 

In this case, I is the expected value of f(X)/g(XX where X is a continuons 
random variable with the density function g(x). Using now a random sample, 
Xi, % 2 , . . . , from this distribution, an estimate of I can be obtained as 




(12.43) 


A 

The variance of is then given by 


where 


Var(/:) = 



n 




g{x) dx — I 



(12.44) 


As before, the error bound can be derived on the basis of the central limit 

A 

theorem, using the fact that is approximately normally distributed with 
mean I and variance (l/7î)oy^ for large n. Hence, as in (12.40), 


E 


* 

n 


1 


< 


^fg^a/2 


A 

with an approximate probability equal to 1 — a, where The 

density g(x) should therefore be chosen so that an error bound smaller than 



THE MONTE CARLO METHOD 


539 


Table 12.2. Approximate Values of /= jiX^ dx 
Using Formula (12.36) 


n /„ 

50 

2.3669 

100 

2.5087 

150 

2.2221 

200 

2.3067 

250 

2.3718 

300 

2.3115 

350 

2.3366 


the one in (12.40) can be achieved. For example, if /(x) > 0, and if 

/(^) 

g(^)= ' . (12.45) 

/ f{x)dx 

''a 

then = 0 as can be seen from formula (12.44). 

Unfortunately, since the exact value of f^f(x)dx is the one we seek to 
compute, formula (12.45) cannot be used. However, by choosing g(x) to 
behave approximately as f(x) [assuming f(x) is positive], we should expect 
a réduction in the variance. Note that the génération of random variables 
from the g(x) distribution is more involved than just using a random sample 
from the uniform distribution U(a, b). 

Example 12.8.1. Consider the intégral I = f^x^ dx= 1 — 23333. Sup- 
pose that we use a sample of 50 points from the uniform distribution Z7(l,2). 
Applying formula (12.36), we obtain = 2.3669. Repeating this process 
several times using higher values of n, we obtain the values in Table 12.2. 

Example 12.8.2. Let us now apply the method of importance sampling 
to the intégral 1= dx — 1J183. Consider the density fonction g(x) = 
f(l +x) over the interval [0, 1]. Using the method described in Section 3.7, a 
random sample, x^X 2 ,...,x„, can be generated from this distribution as 
follows: the cumulative distribution fonction for g(x) is y = G(x) = P[X <x], 
that is, 

y = G(x) = f g{t) dt 

•'O 




540 


APPROXIMATION OF INTEGRALS 


Table 12.3. Approximate Values of /= dx 


n 

/* [Formula (12.46)] 

/„ [Formula (12.36)] 

50 

1.7176 

1.7156 

100 

1.7137 

1.7025 

150 

1.7063 

1.6854 

200 

1.7297 

1.7516 

250 

1.7026 

1.6713 

300 

1.7189 

1.7201 

350 

1.7093 

1.6908 

400 

1.7188 

1.7192 


The only solution of y = G(x) in [0, 1] is 

X = — 1 {1 3yŸ^^ , 0<y<l. 

Hence, the inverse function of G(x) is 


G~\y) = -l + {l + 3yŸ^^, 0<y<l. 


If Vi, 3^2? • • • ? form a random sample of n values from the uniform 
distribution f/(0, 1), then x^ = G~^(y^), X 2 = G~^(y 2 ), ^ ^ ^ , x^ = G~^(yJ will 
form a sample from the distribution with the density function g(x). Formula 
(12.43) can then be applied to approximate the value of I using the estimate 


A 




(12.46) 


Table 12.3 gives for several values of n. For the sake of comparison, 
values of /„ from formula (12.36) [with f(x) = e^, a = 0, ^ = 1] were also 
computed using a sample from the uniform distribution Z7(0, 1). The results 
are shown in Table 12.3. We note that the values of /* are more stable and 

A 

doser to the true value of I than those of 


12.8.2. Intégrais in Higher Dimensions 

The Monte Carlo method can be extended to multidimensional intégrais. 
Consider computing the intégral I = jjy f(x)dx, where Z) is a bounded région 
in the n-dimensional Euclidean space, and x = (x^, X 2 , . . . , x„y. As be- 
fore, we consider I as the expected value of a stochastic process having a 
certain distribution over D. For example, we may take X to be a continuons 
random vector uniformly distributed over D. By this we mean that the 
probability of X being in D is l/v(D), where v(D) dénotés the volume of D, 



APPLICATIONS IN STATISTICS 


541 


and the probability of X being outside D is zéro. Hence, the expected value 
of /(X) is 

I 


The variance of /(X), denoted by is 


<r?=E[f\X)]-{E[f{X)]] 


1 

Tôy 


I /2(x) dx- 

''D 


I 

Tôy 


n 2 


Let us now take a sample of N independent observations on X, namely, 
XpX 2 ,...,X^. Then a consistent estimator of E[f(X)] is (l/A^)Efli/(X-), 
and hence, an estimate of I is given by 



<D) 

N 


N 

L /(X,) 


i = l 


? 


A 

which can be used to approximate I. The variance of is 


( 12 . 47 ) 

A 

If N is large enough, then by the central limit theorem, is approximately 
normally distributed with mean I and variance as in (12.47). It follows that 



< 


v(D) 


with an approximate probability equal to 1 — a, where Ej^ = I In is the 
error of approximation. This formula is analogous to formula (12.40). 

The method of importance sampling can also be applied here to reduce 
the error of approximation. The application of this method is similar to the 
case of a single-variable intégral as seen earlier. 


12.9. APPLICATIONS IN STATISTICS 

Approximation of intégrais is a problem of substantial concern for statisti- 
cians. The statistical literature in this area has grown significantly in the last 
20 years, particularly in connection with intégrais that arise in Bayesian 



542 


APPROXIMATION OF INTEGRALS 


statistics. Evans and Swartz (1995) presented a survey of the major tech- 
niques and approaches available for the numerical approximation of intégrais 
in statistics. The proceedings edited by Flournoy and Tsutakawa (1991) 
includes several interesting articles on statistical multiple intégration, includ- 
ing a detailed description of available software to compute multidimensional 
intégrais (see the article by Kahaner, 1991, page 9). 


12.9.1. The Gauss-Hermite Quadrature 

The Gauss-Hermite quadrature mentioned earlier in Section 12.5 is often 
used for numerical intégration in statistics because of its relation to Gaussian 
(normal) densities. We recall that this quadrature is defined in terms of 
intégrais of the form f(x)dx. Using formula (12.20), we hâve approxi- 

mately 


f e ^y{x)dx^ ^ 

where the x/s are the zéros of the Hermite polynomial H^^^ix) of degree 
7î + 1, and the w/s are suitably corresponding weights. Tables of x^ and oj- 
values are given by Abramowitz and Stegun (1972, page 924) and by Krylov 
(1962, Appendrx B). 

Liu and Tierce (1994) applied the Gauss-Hermite quadrature to intégrais 
of the form g{t)dt, which can be expressed in the form 

CO .00 

g{t)dt= / f{t)^{t, 



where /jl, a) is the normal density 


/X, a) = 


1 


^/27T(T^ 


exp 


1 


2<j 




and f(t) =g(t)/(j)(t, [JL, a). Thus, 


.CO 00 1 

/ g(0dt=j ^==/(t)exp 
J-œ J-œ V27rO-^ 


1 


20" 




2 


dt 


1 .CO ^ 

/ /( /X + /2 dx 

J 


77 -00 


1 " 

i = 0 


(12.48) 



APPLICATIONS IN STATISTICS 


543 


where the x/s are the zéros of the Hermite polynomial + of degree 
7î + 1. We may recall that the x/s and <w/s are chosen so that this approxima- 
tion will be exact for ail polynomials of degrees not exceeding 2n + 1. For 
this reason, Liu and Pierce (1994) recommend choosing /x and a in (12.48) 
so that fit) is well approximated by a low-order polynomial in the région 
where the values of /x+ ^/2 o-x, are taken. More specifically, the in + l)th- 
order Gauss-Hermite quadrature in (12.48) will be highly effective if the 
ratio of g(0 to the normal density f>it, /x, a^) can be well approximated by a 
polynomial of degree not exceeding 2/^ + 1 in the région where git) is 
substantial. This arises frequently, for example, when g(0 is a likelihood 
function [if g(0 > 0], or the product of a likelihood function and a normal 
density, as was pointed out by Liu and Pierce (1994), who gave several 
examples to demonstrate the usefulness of the approximation in (12.48). 


12.9.2. Minimum Mean Squared Error Quadrature 

Correlated observations may arise in some experimental work (Piegorsch and 
Bailer, 1993). Consider, for example, the model 

yqr=f(tq) + V’ 

where represents the observed value from experimental unit r at time 
(^ = 0, 1, . . . , m; r = 1, 2, . . . , 7î), fit^) is the underlying response function, and 
is a random error term. It is assumed that Eie^f) = 0, Cov(e^^, 
and Cov(e^^, = Q ir i=s) for ail /?, q, The area under the response curve 
over the interval tQ<t <t^ is 



(12.49) 


This is an important measure for a variety of experimental situations, 
including the assessment of Chemical bioavailability in drug disposition stud- 
ies (Gibaldi and Perrier, 1982, Chapters 2,7) and other clinical settings. If the 
functional form of /(•) is unknown, the intégral in (12.49) is estimated by 
numerical methods. This is accomplished using a quadrature approximation 
of the intégral in (12.49). By définition, a quadrature estimator of this 
intégral is 


m 

Â= I. cA,/;, (12.50) 

^ = 0 

where is some unbiased estimator of =fUq) with Cov(/^,/^) = o-p^/n, 
and the form a set of quadrature coefficients. 



544 


APPROXIMATION OF INTEGRALS 


/V 

The expected value of A, 


m 


E{A)= j: ca,/,, 


q = 0 


may not necessarily be equal to A due to the quadrature approximation 

A 

employed in calculating A. The bias in estimating A is 

bias =E{Â) — A, 

and the mean squared error (MSE) of Â is given by 


MSE(T) = VarT+ [e{Â) -A^ 


m m 


= 1:1: + 


p = 0 q = 0 


n 


m 


L -A 


q = 0 


2 


which can be written as 


MSE(T) = -(fj'Vcf) + -Af, 

n 


where V = (o^^), f = (/q, / i, . . . , /^y, and <}>' = (c/>o, . . . , Hence, 


MSE{Â) = 


r 1 

-V + fT 

n 




(12.51) 


Let us now seek an optimum value of <(> that minimizes MSE(yl) in (12.51). 
For this purpose, we equate the gradient of MSE(yl) with respect to <}>, 

A 

namely V^MSE(yl), to zéro. We obtain 

= 0 . (12.52) 


V^MSE(yl) =2 


1 ^ 

-V+ff' 

n 


cf>-ylf 


In order for this équation to hâve a solution, the vector Aï must belong to 
the column space of the matrix (l/7r)V + ff'. Note that this matrix is positive 
semidefinite. If its inverse exists, then it will be positive definite, and the only 
solution, to (12.52), namely. 


^^=A 


1 

-V + ff 
n 



(12.53) 


yields a unique minimum of MSE(yl) (see Section 7.7). Using c|>* in (12.50), 



APPLICATIONS IN STATISTICS 


545 


we obtain the following estimate of A: 




r 1 

-V + ff 



n 


(12.54) 


Using the Sherman-Morrison formula (see Exercise 2.14), (12.54) can be 
written as 




A* =nA 


f'V-if- 


-1 


1 +nf'V“if 


(12.55) 


A A A A A 

where f = (/q, / i, . . . ,/^)'. We refer to as an MSE-optimal estimator of 

A 

A. Replacing f with its unbiased estimator f, and A with some initial 
estimate, say ylQ = f'<})Q, where is an initial value of <}>, yields the 
approximate MSE-optimal estimator 

Â** = j^A„ (12.56) 

1 + c 


where c is the quadratic form. 


A .A 

c=nf'V“if. 

A A 

This estimator has the form where cf> = [c/(l + c)]c|>q. Since c > 0, A^"^ 
provides a shrinkage of the initial estimate in Aq toward zéro. This créâtes a 
biased estimator of A with a smaller variance, which results in an overall 
réduction in MSE. A similar approach is used in ridge régression to estimate 
the parameters of a linear model when the columns of the corresponding 
matrix X are multicollinear (see Section 5.6.6). 

We note that this procedure requires knowledge of V. In many applica- 
tions, it may not be possible to specify V. Piegorsch and Bailer (1993) used an 
estimate of V when V was assumed to hâve certain particular patterns, such 
as V=o-^I or V = o-^[(l — p)l + pj], where I and J are, respectively, the 
identity matrix and the square matrix of ones, and a ^ and p are unknown 
constants. For example, under the equal variance assumption V=o-^I, the 
ratio c/(l + c) is equal to 


c 

1 + c 


1 + 


(T 


2 \ 


-1 


nf'f j 


A A A 

If f is normally distributed, f ^ A[f, (l//r)V] — as is the case when f is a vector 
of means, f = (ÿo? • • • ? with = (l//r)E^=iy^;- — l^en an unbiased 



546 


APPROXIMATION OF INTEGRALS 


estimate ot 


is given by 


5 


2 


1 


m n 

E E(y 


{m + l){n-l) 


qr 



Substituting in place oi 


in (12.56), we obtain the area estimator 


=A 


0 


.2 \ 


-1 


1 + 


nf'f j 


A similar quadrature approximation of A in (12.49) was considered earlier 
by Katz and D’Argenio (1983) using a trapezoidal approximation to A. The 
quadrature points were selected so that they minimize the expectation of the 
square of the différence between the exact intégral and the quadrature 
approximation. This approach was applied to simulated pharmacokinetic 
problems. 


12.9.3. Moments of a Ratio of Quadratic Forms 

Consider the ratio of quadratic forms, 

(12 

where A and B are symmetric matrices, B is positive definite, and y is an 
7î X 1 random vector. Ratios such as Q are frequently encountered in 
statistics and econometrics. In general, their exact distributions are mathe- 
matically intractable, especially when the quadratic forms are not indepen- 
dent. For this reason, the dérivation of the moments of such ratios, for the 
purpose of approximating their distributions, is of interest. Sutradhar and 
Bartlett (1989) obtained approximate expressions for the first four moments 
of the ratio Q for a normally distributed y. The moments were utilized to 
approximate the distribution of Q. This approximation was then applied to 
calculate the percentile points of a modified F-test statistic for testing 
treatment effects in a one-way model under correlated observations. 
Morin(1992) derived exact, but complicated, expressions for the first four 
moments of Q for a normally distributed y. The moments are expressed in 
terms of confluent hypergeometric functions of many variables. 

If y is not normally distributed, then no tractable formulas exist for the 
moments of Q in (12.57). Hence, manageable and computable approxima- 
tions for these moments would be helpful. Lieberman (1994) used the 
method of Laplace to provide general approximations for the moments of Q 
without making the normality assumption on y. Lieberman showed that if 
F(y'By) and F[(yAy)^] exist for Æ> 1, then the Laplace approximation of 





APPLICATIONS IN STATISTICS 


547 


E(Q^), the kûi noncentral moment of Q, is given by 




g[(y'Ay)^; 


(12.58) 


In particular, if y ~ A^( /j,, X), then the Laplace approximation for the mean 
and second noncentral moment of Q are written explicitly as 

_ tr(AS) + 
tr(BX) + (Ji'BfjL 

E fkTAj)! 

‘ [£(y'By)t 

Var(y'Ay) + [Æ(y'Ay)]^ 

[£(y'By)]" 

2tr[(A2)^] + 4 |ul'A2A|ul + [tr(A2) + |x^A|ul]^ 

[tr(B2) + 


(see Searle, 1971, Section 2.5 for expressions for the mean and variance of a 
quadratic form in normal variables). 

Example 12.9.1. Consider the linear model, y = Xp + e, where X is 
nXp of rank p, P is a vector of unknown parameters, and e is a random 
error vector. Under the null hypothesis of no serial corrélation in the 
éléments of e, we hâve e^MO, o-^I). The corresponding Durbin-Watson 
(1950, 1951) test statistic is given by 

e'PAiPe 


where P = I — X(X'X) ^X', and A^ is the matrix 



1 

-1 

0 

0 

• • • 

0 

0 

1 

2 

-1 

0 

• • • 

0 

0 

0 

• 

-1 

» 

2 

» 

-1 

» 

• • • 

» 

0 

» 

0 

• 

' • O 

O . . 

» 

• • • 

» 

-1 

» 

2 

» 

-1 

» 

0 

0 

0 

• • • 

0 

-1 

2 

-1 

0 

0 

• • • 

0 

0 

-1 

1 



548 


APPROXIMATION OF INTEGRALS 


Then, by (12.58), the Laplace approximation of E{d) is 


EM) = 


£(e'PAiPe) 

£(e'Pe) 


(12.59) 


Durbin and Watson (1951) showed that d is distributed independently of its 
own denominator, so that the moments of the ratio d are the ratios of the 
corresponding moments of the numerator and denominator, that is, 




^[(e'PAiPe)* 

£[(e'Pe)^ 


(12.60) 


From (12.59) and (12.60) we note that the Laplace approximation for the 
mean, E{d), is exact. For Æ > 2, Lieberman (1994) showed that 


EM'^) 

E{d’^) 


1 + 0 


1 

n , ' 


Thus, the relative error of approximating higher-order moments of d is 
0{l/n\ regardless of the matrix X. 


12.9.4. Laplace’s Approximation in Bayesian Statistics 

Suppose (Kass, Tierney, and Kadane, 1991) that a data vector y = 

^as a distribution with the density function p(y\0), where 0 
is an unknown parameter. Let L(0) dénoté the corresponding likelihood 
function, which is proportional to p{y\ 0). In Bayesian statistics, a prior 
density, 7t(0), is assumed on 0, and inferences are based on the posterior 
density q{0\y), which is proportional to L(0)tt(0), where the proportionality 
constant is determined by requiring that q(0\y) integrate to one. For a given 
real-valued function g(^), its posterior expectation is given by 

fg(e)L(e)7T(0)de 

£[g(0)|y]= ^ . (12.61) 

JL(0)7r(0)d0 


Tierney, Kass, and Kadane (1989) expressed the integrands in (12.61) as 
follows: 


g{e)L{0)Tr{0) =bj^{0)exp[-nhf^{e)], 
L{0 )tt{ 0) =bo{0)exp[-nho{0)], 



APPLICATIONS IN STATISTICS 


549 


where bj^{0) and bj^(0) are smooth functions that do not dépend on n and 
hj^iO) and hjy{9) are constant-order functions of n, as Formula 

(12.61) can then be written as 


( bj^{e) Qx^[-nh ^{e)\ de 

£[g(0)|y]=- . (12.62) 

Jbj)(6) G-xp[—nhj;){0)]d0 


Applying Laplace’s approximation in (12.25) to the intégrais in (12.62), we 
obtain, if n is large, 


£L(0)|y] 




:^.)i 

Z. / 

K 

|exp 




h"M{ 



'^D 

K ^ ) 

)exp 

-nho( 

,»»)] 


(12.63) 


where On and are the locations of the single maxima of -/z^(^) and 
— hj)(0), respectively. In particular, if we choose /ïjy(^) = hj^iO) = 
-(l/7î)log[L(0)7r(^)], bN(0) = g(0) and bj^iO) = 1, then (12.63) reduces to 

£[^(0)|y] (12.64) 


where 0 is the point at which (l/7î)log[L(^)7r(0)] attains its maximum. 

Formula (12.64) provides a first-order approximation of £'[^(^)|y]. This 
approximation is often called the modal approximation because 0 is the mode 
of the posterior density. A more accurate second-order approximation of 
E\g(^0) y] was given by Kass, Tierney, and Kadane (1991). 


12.9.5. Other Methods of Approximating Intégrais in Statistics 

There are several major techniques and approaches available for the numeri- 
cal approximation of intégrais in statistics that are beyond the scope of this 
book. These techniques, which include the saddlepoint approximation and 
Markov chain Monte Carlo, hâve received a great deal of attention in the 
statistical literature in recent years. 

The saddlepoint method is designed to approximate intégrais of the 
Laplace type in which both the integrand and contour of intégration are 
allowed to be complex valued. It is a powerful tool for obtaining accurate 
expressions for densities and distribution functions. A good introduction to 
the basic principles underlying this method was given by De Bruijn (1961, 
Chapter 5). Daniels (1954) is credited with having introduced it in statistics in 
the context of approximating the density of a sample mean of independent 
and identically distributed random variables. 

Markov chain Monte Carlo (MCMC) is a general method for the simula- 
tion of stochastic processes having probability densities known up to a 



550 


APPROXIMATION OF INTEGRALS 


constant of proportionality. It generally deals with high-dimensional statisti- 
cal problems, and has corne into prominence in statistical applications during 
the past several years. Although MCMC has potential applications in several 
areas of statistics, most attention to date has been focused on Bayesian 
applications. 

For a review of these techniques, see, for example, Geyer (1992), Evans 
and Swartz (1995), Goutis and Casella (1999), and Strawderman (2000). 


FURTHER READING AND ANNOTATED BIBLIOGRAPHY 

Abramowitz, M., and L A. Stegun, eds. (1972). Handbook of Mathematical Functions 
with Formulas, Graphs, and Mathematical Tables. Wiley, New York. (This volume 
is an excellent source for a wide variety of numerical tables of mathematical 
functions. Chap. 25 gives zéros of Legendre, Hermite, and Laguerre polynomials 
along with their corresponding weight factors.) 

Copson, E. T. (1965). Asymptotic Expansions. Cambridge University Press, London. 
(The method of Laplace is discussed in Chap. 5.) 

Courant, R., and F. John (1965). Introduction to Calculus and Analysis, Volume 1. 
Wiley, New York. (The trapézoïdal and Simpson’s methods are discussed in 
Chap. 6.) 

Daniels, H. (1954). “Saddlepoint approximation in statistics.” A/z/î.Mtït/z. Statist., 25 , 
631-650. 

Davis, P. J., and P. Rabinowitz (1975). Methods of Numerical Intégration. Academie 
Press, New York. (This book présents several useful numerical intégration 
methods, including approximate intégrations over finite or infinité intervals as 
well as intégration in two or more dimensions.) 

De Bruijn, N. G. (1961). Asymptotic Methods in Analysis, 2nd ed. North-Holland, 
Amsterdam. (Chap. 4 covers the method of Laplace, and the saddlepoint method 
is the topic of Chap. 5.) 

Durbin, J., and G. S. Watson (1950). “Testing for serial corrélation in least squares 
régression, I.” Biometrika, 37, 409-428. 

Durbin, J., and G. S. Watson (1951). “Testing for serial corrélation in least squares 
régression, II.” Biometrika, 38 , 159-178. 

Evans, M., and T. Swartz (1995). “Methods for approximating intégrais in statistics 
with spécial emphasis on Bayesian intégration problems.” Statist. Sci., 10 , 254-272. 

Flournoy, N., and R. K. Tsutakawa, eds. (1991). Statistical Multiple Intégration, 
Contemporary Mathematics 115 . Amer. Math. Soc., Providence, Rhode Island. 
(This volume contains the proceedings of an AMS-IMS-SIAM joint summer 
research conférence on statistical multiple intégration, which was held at Hum- 
boldt University, Arcata, California, lune 17-23, 1989.) 

Fulks, W. (1978). Advanced Calculus, 3rd ed. Wiley, New York. (Section 18.3 of this 
book contains proofs associated with the method of Laplace.) 

Geyer, C. J. (1992). “Practical Markov chain Monte Carlo.” Statist. Sci., 1, 473-511. 



FURTHER READING AND ANNOTATED BIBLIOGRAPHY 


551 


Ghazal, G. A. (1994). “Moments of the ratio of two dépendent quadratic forms.” 
Statist. Prob. Letters, 20, 313-319. 

Gibaldi, M., and D. Perrier (1982). Pharmacokinetics, 2nd ed. Dekker, New York. 

Goutis, C., and G. Casella (1999). “Explaining the saddlepoint approximation.” 
Statist., 53, 216-224. 

Haber, S. (1970). “Numerical évaluation of multiple intégrais.” SIAM Rev., 12 , 
481-526. 

Kahaner, D. K. (1991). “A survey of existing multidimensional quadrature routines.” 
In Statistical Multiple Intégration, Contemporary Mathematics 115 , N. Flournoy 
and R. K. Tsutakawa, eds. Amer. Math. Soc., Providence, pp. 9-22. 

Kass, R. E., L. Tierney, and J. B. Kadane (1991). “Laplace’s method in Bayesian 
analysis.” In Statistical Multiple Intégration, Contemporary Mathematics 115 , 
N. Flournoy and R. K. Tsutakawa, eds. Amer. Math. Soc., Providence, pp. 89-99. 

Katz, D., and D. Z. D’Argenio (1983). “Experimental design for estimating intégrais 
by numerical quadrature, with applications to pharmacokinetic studies.” Biomét- 
ries, 39 , 621-628. 

Krylov, V. I. (1962). Approximate Calculation of Intégrais. Macmillan, New York. (This 
book considers only the problem of approximate intégration of functions of a 
single variable. It was translated from the Russian by A. H. Stroud.) 

Lange, K. (1999). Numerical Analysis for Statisticians. Springer, New York. (This book 
contains a wide variety of topics on numerical analysis of potential interest to 
statisticians, including recent topics such as bootstrap calculations and the 
Markov chain Monte Carlo method.) 

Lieberman, O. (1994). “A Laplace approximation to the moments of a ratio of 
quadratic forms.” Biometrika, 81 , 681-690. 

Liu, Q., and D. A. Pierce (1994). “A note on Gauss-Hermite quadrature.” Biometrika, 
81 , 624-629. 

Morin, D. (1992). “Exact moments of ratios of quadratic forms.” Metron, 50 , 59-78. 

Morland, T. (1998). “Approximations to the normal distribution function.” Math. 
Gazette, 82 , 431-437. 

Nonweiler, T. R. F. (1984). Computational Mathematics. Ellis Horwood, Chichester, 
England. (Numerical quadrature is covered in Chap. 5.) 

Phillips, C., and B. Cornélius (1986). Computational Numerical Methods. Ellis Hor- 
wood, Chichester, England. (Numerical intégration is the subject of Chap. 6.). 

Phillips, G. M., and P. J. Taylor (1973). Theory and Applications of Numerical Analysis. 
Academie Press, New York. (Gaussian quadrature is covered in Chap. 6.). 

Piegorsch, W. W., and A. J. Bailer (1993). “Minimum mean-square error quadrature.” 
J. Statist. Comput. Simul, 46 , 217-234. 

Ralston, A., and P. Rabinowitz (1978). A First Course in Numerical Analysis. 
McGraw-Hill, New York. (Gaussian quadrature is covered in Chap. 4.) 

Reid, W. H., and S. J. Skates (1986). “On the asymptotic approximation of intégrais.” 
SIAM J. Appl. Math., 46 , 351-358. 

Roussas, G. G. (1973). A First Course in Mathematical Statistics. Addison-Wesley, 
Reading, Massachusetts. 

Searle, S. R. (1971). Linear Models. Wiley, New York. 



552 


APPROXIMATION OF INTEGRALS 


Shoup, T. E. (1984). Applied Numerical Methods for the Microcomputer. Prentice-Hall, 
Englewood Cliffs, New Jersey. 

Stark, P. A. (1970). Introduction to Numerical Methods. Macmillan, London. 

Strawderman, R. L. (2000). “Higher-order asymptotic approximation: Laplace, sad- 
dlepoint, and related methods.”/. Amer. Statist. Assoc., 95, 1358-1364. 

Stroud, A. H. (1971). Approximate Calculation of Multiple Intégrais. Prentice-Hall, 
Englewood Cliffs, New Jersey. 

Sutradhar, B. C., and R. F. Bartlett (1989). “An approximation to the distribution of 
the ratio of two general quadratic forms with application to time sériés valued 
designs.” Comm. Statist. Theory Methods, 18 , 1563-1588. 

Tierney, L., R. E. Kass, and J. B. Kadane (1989). “Fully exponential Laplace 
approximations to expectations and variances of nonpositive functions.” /. Amer. 
Statist. Assoc., 84 , 710-716. 

Wong, R. (1989). Asymptotic Approximations of Intégrais. Academie Press, New York. 
(This is a useful référencé book on the method of Laplace and Mellin transform 
techniques for multiple intégrais. AU results are accompanied by error bounds.) 

EXERCISES 

In Mathematics 
12 . 1 . Consider the intégral 



It is easy to show that 

/„ = 7Î log 7Î — 7Î + 1 . 

(a) Approximate I^ by using the trapézoïdal method and the partition 

points Xq = 1, = 2, . . . , = /r, and verify that 

/„ = log(n!) -^logn. 

(b) Deduce from (a) that n\ and are of the same order of 

magnitude, which is essentially what is stated in Stirling’s formula 
(see Example 12.6.1) 

12 . 2 . Obtain an approximation of the intégral f^dx/il -\-x) by Simpson’s 
method for the following values of n: 2, 4, 8, 16. Show that when n = S, 
the error of approximation is less than or equal to 0.000002. 



EXERCISES 


553 


12.3. Use Gauss-Legendre quadrature with n = 2 to approximate the value 
of the intégral sin OdO. Give an upper bound on the error of 
approximation using formula (12.16). 

12.4. (a) Show that 

re-^'dx<—e~^\ m>0. 

•^m m 

2 

[Hint: Use the inequality e~^ <e~^^ for x > m.] 

(b) Find a value of m so that the upper bound in (a) is smaller than 
10 "^ 

2 

(c) Use part (b) to find an approximation for dx correct to 

three décimal places. 

12.5. Obtain an approximate value for the intégral f^xie^ e~^ — ï)~^ dx 
using the Gauss -Laguerre quadrature. [Hint: Use the tables in Ap- 
pendix C of the book by Krylov (1962, page 347) giving the zéros of 
Laguerre polynomials and the corresponding values of w,.] 

12.6. Consider the indefinite intégral 


rx dt 

= \ T = Arctan x. 


(a) Make an appropriate change of variables to show that I{x) can be 
written as 


I{x) = 


du 


-1 4 -\-x^{u + 1) 

(b) Use a five-point Gauss -Legendre quadrature to provide an ap- 
proximation for 7(x). 

12.7. Investigate the asymptotic behavior of the intégral 


/(A) = r (cosx)^ dx 
•'O 


as A ^ 00 . 

12.8. (a) Use the Gauss-Laguerre quadrature to approximate the intégral 
fQe~^^ sin xdx using 7 î = 1,2, 3, 4, and compare the results with 
the true value of the intégral. 



554 


APPROXIMATION OF INTEGRALS 


(b) Use the Gauss-Hermite quadrature to approximate the intégral 
fZoo\x\&xp{ — 3x^)dx using tî = 1,2,3,4, and compare the results 
with the true value of the intégral. 

12.9. Give an approximation to the double intégral 


1 m-x?)V2 


// 

Jn Jn 


(1 —xf dx^ dx 


0 •'0 


by applying the Gauss -Legendre rule to formulas (12.32) and (12.33). 

12 . 10 . Show that dx is asymptotically equal to \{7r/nŸ^^ as 


In Statistics 

12 . 11 . Consider a sample, of independent and indentically 

distributed random variables from the standard normal distribution. 
Suppose that n is odd. The sample médian is the {m + l)th order 
statistic + where m = {n — l)/2. It is known that has 

the density function 


where 



U’"ix)[i-^ix)rcPix), 


4>{x) 



is the standard normal density function, and 


.X 

T>(x) = / cj}(t) dt 
— 00 

(see Roussas, 1973, page 194). Since the mean of + is zéro, the 
variance of X(;„ + i) is given by 

f jc^<J>’"(x)[l - (!>(x)r c/)(x) dx. 

Obtain an approximation for this variance using the Gauss-Hermite 
quadrature for n = 11 and varying numbers of quadrature points. 



EXERCISES 


555 


12 . 12 . (Morland, 1998.) Consider the density function 


<p{x) 



for the standard normal distribution. Show that if x > 0, then the 
cumulative distribution function, 


cï)(x) 



— CO 


can be represented as the sum of the sériés 


1 X 

cï)(x) = - + 


V2 


77 


X" X" X' 
1 + 


40 336 


X 


8 


+ 


3456 




n 


X 


2n 


( 2 / 7 + 1 ) 2 " 77 ! 


+ 


[Note: By truncating this sériés, we can obtain a polynomial approxi- 
mation of <î>(x). For example, we hâve the following approximation of 
order 11: 


$(x) 



42240 



12 . 13 . Use the resuit in Exercise 12.12 to show that there exist constants 
a, b,c, d, such that 



X 1 + ax^ + bx"^ 
y[27T 1 + cx^ + dx"^ 


12 . 14 . Apply the method of importance sampling to approximate the value of 
the intégral dx using a sample of size n = 150 from the distribu- 
tion whose density function is g(x) = ^(l+x), 0<x<2. Compare 
your answer with the one you would get from applying formula 
(12.36). 



556 


APPROXIMATION OF INTEGRALS 


12 . 15 . Consider the density function 


f(x) 



— 00 <X < 00, 


for a ^-distribution with n degrees of freedom (see Exercise 6.28). Let 
F(x) = be its cumulative distribution function. Show that 

for large n, 


F(x) 



which is the cumulative distribution function for the standard normal. 
[Hint: Apply Stirling’s formula.] 



AP P E ND I X 


Solutions to Selected Exercises 


CHAPTER 1 

1.3. AVJ C implies Hence, A=AC\B. Thus, AC\B <zC implies 

ylcC.lt follows that yl n C = 0. 

1.4. (a) x^A^B implies that x^A but ^B, or x^B but ^yl; thus 

xcyl U_S— yl C\ B. Vice versa, if xcylu_S— yln_S, then 
X Cyl aB. 

(c) X cyl n (5 a_D) implies that xcyl and x^BaD, so that either 
xcyln_Sbut ^yl n Z), or X cyl n Z) but ^yl D 5, so that x c (yl 
n_6) A(yl nZ>). Vice versa, if x c (yl n 5) A(yl n Z)), then either 
X cyl n B but ^A n Z), or X cyl n D but ^A n B, so that either 
X is in yl and B but ^Z), or x is in yl and Z), but ^B; thus 
xcyl n (BaD). 

1.5. It is obvious that p is reflexive, symmetric, and transitive. Hence, it is 
an équivalence relation. If (mQ,7ÎQ) is an élément in yl, then its 
équivalence class is the set of ail pairs (m, n) in yl such that m/m^ = 
n/riQ, that is, m/n =m^/nQ. 

1.6. The équivalence class of (1,2) consists of ail pairs (m, n) in yl such that 
m — n = — 1 . 

1.7. The first éléments in ail four pairs are distinct. 

1.8. (a) If y ^f((J^=iA-), then y =/(x), where xc Hence, if 

xcyl. and /(x) c/(yl .) for some /, then /(x) c (J^=if(A-); thus 
/(U <= U f=i/(^/X Vice versa, it is easy to show that 

ur.i/u,)c/(ür.i X). 

(b) If y c/(n ”=iyl/), then y =/(x), where xcyl- for ail i; then 
/(x) c/(yl.j for ail i\ then /(x) c fi ”=i/(yl/). Equality holds if / is 
one-to-one. 


557 



558 


SOLUTIONS TO SELECTED EXERCISES 


1.11. Define /: J'^ such that fin) = 2n'^ + 1. Then / is one-to-one and 
onto, so A is countable. 

1.13. a }/b = c-\-}/d=^a— c = }/d— }/bAia=c, then b = dAt a ¥= c, then 

}/d — }/b is a nonzero rational number and }/d }/b =id — b)/{\[d 

— }/b). It follows that both }/d — }/b and }/d }/b are rational num- 

bers, and therefore }/b and }fd must be rational numbers. 

1.14. Let g = mîiA). Then g<x for ail x in A, so —g> — x, hence, —g is 
an upper bound of —A and is the least upper bound: if —g' is any 
other upper bound of —A, then —g’> — x, so x>g\ hence, g' is a 
lower bound of A, so g’ <g, that is, —g' > —g, so —g = sup(— yl). 

1.15. Suppose that b ^ A. Since b is the least upper bound of yl, it must be a 
limit point of yl : for any 6 > 0, there exists an element a ^A such that 
a> b — Furthermore, a <b, since b ^A. Hence, è is a limit point of 
yl. But yl is closed; hence, by Theorem 1.6.4, b ^A. 


1.17. Suppose that G is a basis for and let p Then B = where 

belongs to G. Hence, there is at least one such that p ^U^(zB. 
Vice versa, if for each B and each p ^B there is a G e G such 
that P <zB, then G must be a basis for ^ for each /? we can 
find a set Gp e G such that p ^Up<zB; then 5 = U [Up p e B}, so G is 
a basis. 


1.18. Let P be a limit point of yl U 5. Then /? is a limit point of yl or of B. 
In either case, p ^A U B. Hence, by Theorem 1.6.4, Au B is a closed 
set. 

1.19. Let {C^} be an open covering of B. Then B and {C^} form an open 
covering of yl. Since yl is compact, a finite subcollection of the latter 
covering covers yl. Furthermore, since B does not cover B, the mem- 
bers of this finite covering that contain B are ail in {C^}. Hence, B is 
compact. 

1.20. No. Let (yl,=^) be a topological space such that yl consists of two 
points a and b and ^consists of yl and the empty set 0. Then yl is 
compact, and the point a is a compact subset; but it is not closed, since 
the complément of a, namely b, is not a member of ^ and is therefore 
not open. 

1.21. (a) Let vveyl. Then X(w)<x<x + 3“” for ail n, so w e 

Vice versa, if w e then X(w)<x + 3“” for ail n. To 

show that X(w) <x: if X(w) >x, then X(w) >x + 3“” for some n, 
a contradiction. Therefore, Xiw) <x, and w ^A. 



CHAPTER 1 


559 


(b) Let w^B. Then X{w)<x, so X(w) <x — 3~” for some n, hence, 
w ^ versa, if w e (JZ=i^n^ then X(w)<x — 3"" for 

some n, so X(w) <x: if X(w) >x, then X(w) >x — 3“” for ail n, a 
contradiction. Therefore, w 


1.22. (a) P{X > 2) < A/2, since /jl = X. 

(b) 


P{X>2) = l-p{X<2) 


= l-p(0)-p(l) 


= l-c“^(A+l). 


To show that 


1 -e ^^(A+ 1) < — . 

Let (/>(A) = A/2 + e~^(X + 1) — 1. Then 

i. <^>(0) = 0, and 

ii. (j)'(X) = ^ — Ac“'^ > 0 for ail A. 

From (i) and (ii) we conclude that (/>(A) > 0 for ail A > 0. 


1.23. (a) 


p{\X-iJi\>c) =P[{X- ixŸ>c 


2 


(T 


2 




by Markov’s inequality and the fact that E(X— p)^ = 

(b) Use c=ka in (a). 

(c) 

P{\X- p\ <k(T) =l-P{\X- p\ >ka) 

1 

> 1 - 


k 


2 • 


1.24. P =E(X) = 1^1 x(l — \x\) dx = 0, (T^ = x^(l — \x\)dx= 

(a) 

i. P(\X\ >l)<a^/l=l 

ii. P(\X\ >2)=p(\x\ >l)<a^/l=l 

(b) 

P(|X|>i)=P(X>i)+P(X<-i) 


= ^ 

4 ^ 3 * 



560 


SOLUTIONS TO SELECTED EXERCISES 


1.25. (a) True, since >x if and only if X- >x for ail i. 

(b) True, since X^„^ <x if and only if X- <x for ail i. 

(c) 

P(X(i,<x) = l-P(X(D>x) 

= l-[l-F(x)]\ 

(d) P(X(„)<x) = [P(x)r. 

1.26. P(2 < < 3) = P(X(i) < 3) - P(X(i) < 2). Hence, 

P(2 < 3) = 1 - [1 -P(3)]' - 1 + [1 -P(2)]' 

= [l-P(2)]^-[l-P(3)]^ 

But F(x) = fQ2e~^^ dt = 1 — Hence, 

P(2<X(i,<3) = (e-^f-(e-of 

= e-20-e-3o. 


CHAPTER 2 

2.1. If m > 7î, then the rank of the nXm matrix U = [u^ : U 2 : *** : n^] is less 
than or equal to n. Hence, the number of linearly independent columns 
of U is less than or equal to n, so the m columns of U must be linearly 
dépendent. 

2.2. Ifui,U 2 ,...,u„ and v are linearly dépendent, then v must belong to W, 
a contradiction. 

2 . 5 . Suppose that Ui,U2,...,u^ are linearly independent in U, and that 
û:,T(Uj) = 0 for some constants «2, . . . , Then T(E”=i 
= 0. If T is one-to-one, then E”=ia/U^ = 0, hence, = 0 for 
ail û since the u/s are linearly independent. It follows that TCu^), 
r(u2), ...,T(u„) must also be linearly independent. Vice versa, let 
U e 1 / such that T(u) = 0. Let ei,C2,...,e„ be a basis for U. Then 
U = for some constants r^, T2, . . . , r„. T(u) = 0 => Ef^iT^TCe,) 

= 0 => = 0 for ail i, since T(ei), T(e2), . . . , T(e„) are linearly inde- 

pendent. It follows that U = 0 and T is one-to-one. 


2.7. If A = (a^p, then tr(XA) = Hence, A = 0 if and only if 

tr(AA) = 0. 



CHAPTER 2 


561 


2.8. It is sufficient to show that Av = 0 if v Av = 0. If v Av = 0, then v = 0, 
and hence Av = 0. [Note: A^^^ is defined as follows: since A is symmet- 
ric, it can be written as A = PAP' by Theorem 2.3.10, where A is a 
diagonal matrix whose diagonal éléments are the eigenvalues of A, and 
P is an orthogonal matrix. Furthermore, since A is positive semidefinite, 
its eigenvalues are greater than or equal to zéro by Theorem 2.3.13. 
The matrix A^^^ is defined as PA^^^P', where the diagonal éléments of 

AV2 

are the square roots of the corresponding diagonal éléments 
of A.] 

2.9. By Theorem 2.3.15, there exists an orthogonal matrix P and diagonal 
matrices A^ A 2 such that A= PA^P', B = PA 2 P'. The diagonal élé- 
ments of A^ and A 2 are nonnegative. Hence, 

AB = PAiA2P' 

is positive semidefinite, since the diagonal éléments of A^A 2 are 
nonnegative. 

2.10. Let C = AB. Then C' = — BA. Since tr(AB) = tr(BA), we hâve tr(C) = 
tr(- e) = -tr(C) and thus tr(C) = 0. 

2.11. Let B = A — A. Then 

tr (B' B) = tr [ (A - A) ( A - A ) ] 

= tr[(A- A)A] -tr[(A-A)A] 

= tr[(A— A)A] — tr[A(A — A)] 

= tr[(A— A)A] — tr[(A — A)A] 

= 2tr[(A-A)A] =0, 

since A — A is skew-symmetric. Hence, by Exercise 2.7, B = 0 and thus 
A = A. 

2.12. This follows easily from Theorem 2.3.9. 

2.13. By Theorem 2.3.10, we can write 

A-AI„ = P(A-AI„)P', 

where P is an orthogonal matrix and A is diagonal with diagonal 
éléments equal to the eigenvalues of A. The diagonal éléments of 
A — AI„ are the eigenvalues of A — AI„. Now, k diagonal éléments of 
A — AI„ are equal to zéro, and the remaining n—k éléments must 
be different from zéro. Hence, A — AI„ has rank n—k. 



562 


SOLUTIONS TO SELECTED EXERCISES 


2 . 17 . If AB = BA = B, then (A - B)^ = A^ - AB - BA + B^ = A - 2B + B = 
A — B. Vice versa, if A — B is idempotent, then A — B = (A — B)^ = 
A^ - AB - BA + B^ = A - AB - BA + B. Thus, 

AB + BA = 2B, 

so AB + ABA = 2AB and thus AB = ABA. We also hâve 

ABA + BA = 2BA, hence, BA = ABA. 

It follows that AB = BA. We finally conclude that 

B = AB = BA. 

2.18. If A is an nXn orthogonal matrix with déterminant I, then its eigen- 

values are of the form e - e - . . . , e - I, where I is of multiplic- 

ity n — 2 q and none of the real numbers 4>j is a multiple of 27 t 
( y = 1, 2, ... , q) (those that are odd multiples of tt give an even number 
of eigenvalues equal to — I). 

2.19. Using Theorem 2.3.10, it is easy to show that 

emin(A)I„<A<e^^(A)I„, 

where the inequality on the right means that Cj^^(A)I„ — A is positive 
semidefinite, and the one on the left means that A — is 

positive semidefinite. It follows that 

e„.„(A)L'L<L'AL<e_(A)L'L 

Hence, by Theorem 2.3.19(1), 

emin(A) tr(L'L) < tr(L'AL) < e„^(A) tr(L'L) . 

2.20. (a) We hâve that A > by Theorem 2.3.10. Hence, LAL> 

and therefore, e^^(UAL) > e^^(A)e^^(Uh) by Theorem 
2.3.18 and the fact that > 0 and HL is nonnegative definite. 

(b) This is similar to (a). 

2.21. We hâve that 


e„.„(B)A<Ai/2BAi/2<e_(B)A, 
since A is nonnegative definite. Hence, 

emin(B) tr(A) < < e^^(B) tr(A) . 



CHAPTER 2 


563 


The resuit follows by noting that 

tr(AB) =tr(A^/2BA^/2). 

2.23. (a) A = I„ — (1 /tî)J„, where J„ is the matrix of ones of order nXn. 

A^ = A, since = n J„. The rank of A is the same as its trace, which 
is equal to n — 1. 

(b) (n — l)s^/(T^ = (l/a^)y'Ay Xn-i^ since (1 /(t^)A((t^I„) is idem- 
potent of rank n — 1, and the noncentrality parameter is zéro. 

(c) 

Cov( ÿ , Ay) = Cov( - 1; y, Ay 

\ ^ 

= U'„{crX)A 

n ^ 

= — l'„A 
n 

= 0. 

Since y is normally distributed, both ÿ and Ay, being linear trans- 
formations of y, are also normally distributed. Hence, they must be 
independent, since they are uncorrelated. We can similarly show 
that ÿ and A^^^ y are uncorrelated, and hence independent, by the 
fact that T„ A = 0' and thus l'„Al„ = 0, which means T„A^^^ = It 
follows that ÿ and y A^^^A^^^ y = y Ay are independent. 

2.24. (a) The model is written as 


y = Xg + € , 

where X = [1^ : © dénotés the direct sum of matrices, 

g = ( /X, « 1 , 0^2? • • • 5 ^ ^ Now, /X + is estimable, 

since it can be written as a'^g, where a'^ is a row of X, i= 1, 2, . . . , a. 
It follows that a- — a-> = (a^ — a'^Og is estimable, since the vector 
Si'i — aV belongs to the row space of X. 

(b) Suppose that /x is estimable; then it can be written as 

a 

E "T/C /A + «/)> 

i = l 

where r^, t 2 , . . . , are constants, since /x + /x + « 2 , . . . , /x + 
form a basic set of estimable linear functions. We must therefore 
hâve = 1, and r- = 0 for ail /, a contradiction. Therefore, /x is 
nonestimable. 



564 


SOLUTIONS TO SELECTED EXERCISES 


2.25. (a) X(X'X)"X'X(X'X)“X' = X(X'X)“X' by Theorem 2.3.3(2). 

(b) £(/'y) = X'p =>/'Xp = \'p. Hence, X =/'X. Now, 

Var(Xp) =X(X'X)“X'X(X'X)“Xo-2 

= /' x( X' X) “ X' x( X' X) ■ X'/o- 2 
= /'X(X'X) X'/cr^, by Theorem 2.3.3(2) 

since I„ — X(X'X)“X' is positive semidefinite. The resuit follows 
from the last inequality, since Var(/'y) 

2.26. P* = P(A + ÆIp)“ip'X'y. But p = PA-ip'X'y, so AP'P = P'X'y. 
Hence, 

p=^=P(A + ÆI^)“^AP P 

= PDP p. 

2.27. 

supp^= sup ( V^Ci”^AC 2 
x,y v,T 

= sup I sup (v'Cf^AC 2 ^t)^| 

V V T '' 

= sup / sup (b'T)^\ (b' = v'Cf ^AC 2 

V '' T ^ 

= sup { sup (T'bb'j) > 

V ' T ' 

= sup {e,„^(bb')}, by applying (2.9) 

V 

= sup{e,„ax(b'b)} 

V 

= sup{v'Cr^AC2^ACr^v} 

V 

= c_(CriAC 2 ^A'Cri), by(2.9) 

= e_(Cr^AC2-2A) 

= c_(BriAB2-iA). 



CHAPTER 3 


565 


CHAPTER 3 

3.1. (a) 

— 1 

lim = \im(l+x+x^ +x^ +x^) 

x—*l X — 1 x—*l 

= 5. 

(b) |xsin(l/x) |< |x| ^ xsin(l/x) = 0. 

(c) The limit does not exist, since the function is equal to 1 except 
when sin(l/x) = 0, that is, when x = + 1 /tt, + 1/27t, . . . , + 
1//Î7T, . . . . For such values, the function takes the form 0/0, and is 
therefore undefined. To hâve a limit at x = 0, the function must be 
defined at ail points in a deleted neighborhood of x = 0. 

(d) lim^^ 0 " ^ è lim^^o+ ^ 1- Therefore, the function 
/(x) does not hâve a limit as x ^ 0. 

3.2. (a) To show that (tan x^)/x^ ^0 as x ^ 0: (tan x^)/x^ = 

(1/cos x^Xsin x^)/x^. But Isin x^ I < |x^ |. Hence, |(tan x^)/x^ | < 
|x| / |cos x^l ^ 0 as X ^ 0. 

(b) x/ Vx ^ 0 as X ^ 0. 

(c) 0(1) is bounded as x^^o. Therefore, O(l)/x^0 as x^°o^ so 
0(1) = o(x) as X ^ CO. 

(d) 

f{x)g{x) = [x + o{x^)] —+0 

X 

1 / 1 \ 1 / 1 \ 

= hxO — H T o(x^) + o(x^)0 — . 

X \X j X \x I 

Now, by définition, |xO(l/x) | is bounded by a positive constant as 
x^O, and o(x^)/x^^0 as x^O. Hence, o(x^)/x^ is bounded, 
that is, 0(1). Furthermore, o(x^)0(l/x) =x[o(x^)/x^]xO(l/x), 
which goes to zéro as x ^ 0, is also bounded. It follows that 

f{x)g{x) = - +0(1). 

JC 

3.3. (a) /(O) = 0 and lim^^ o = 0- Hence, /(x) is continuons at x = 0. It 

is also continuons at ail other values of x. 

(b) The function /(x) is defined on [1, 2], but lim^ ^ ^ 

is not continuons at x = 2. 




566 


SOLUTIONS TO SELECTED EXERCISES 


(c) If n is odd, then f{x) is continuous everywhere, except at x = 0. If 
n is even, f{x) will be continuous only when x>0. (m and n must 
be expressed in their lowest terms so that the only common divisor 
is 1). 

(d) f(x) is continuous for x¥= 1. 


3 . 6 . 


f(x) 


— 3, X ¥= 0, 

0, x = 0. 


Thus f(x) is continuous everywhere except at x = 0. 


3.7. lim^ ^ o-fM = 2 and lim^ ^ o+fM = 0- Therefore, /(x) cannot be made 
continuous at x = 0. 


3.8. Letting a = b = 0 in f(a + b) =f(a) -\-f(b), we conclude that /(O) = 0. 
Now, for any x^, X 2 ^ R, 

f(Xi) =f{X2) +/(Xi-X2). 


Let z=Xi— X 2 . Then 


|/(Xi) -f{X2)\=\f{z)\ 

= \f{z) -/(0)|. 

Since /(x) is continuous at x = 0, for a given e > 0 there exists a ô > 0 
such that |/(z) “/(O) 
ail x^, X 2 such that 
everywhere in R. 


< e if I Z I < ô. Hence, |/(xi) — /(X 2 ) \ < € for 
< 8. Thus /(x) is uniformly continuous 


X1-X2 


3.9. lim^^i-/(x) = lim^^i+/(x) =/(l) = 1, so /(x) is continuous at x=l. 
It is also continuous everywhere else on [0,2], which is closed and 
bounded. Therefore, it is uniformly continuous by Theorem 3.4.6. 


3 . 10 . 


cos Xi — cos X 2 


2sin 




<2 


sin 



sini 


X2 +Xj ^ 


< 


X1-X2 


for ail x^, X 2 in R 


Thus, for a given c > 0, there exists a ô > 0 (namely, 5 < c) such that 


cos Xi - cos X 


< € whenever 


Xi -X2 


< 8 . 



CHAPTER 3 


567 


3.13. f(x) = 0 if X is a rational number in [a, b]. If x is an irrational number, 

then it is a limit of a sequence of rational numbers (any 

neighborhood of x contains infinitely many rationals). Hence, 

/(x) =/( lim y„] = lim f{y„) = 0, 

since f(x) is a continuons function. Thus f(x) = 0 for every x in [a, b]. 

3.14. f(x) can be written as 

[ 3 — 2x, X < — 1, 

/(x) = 5, -Kx<l, 

\ 3 + 2x, X > 1. 

Hence, /(x) bas a unique inverse for x < — 1 and for x > 1. 

3.15. The inverse function is 


— f-i 


i(y+l), y>l. 


3.16. (a) The inverse of /(x) is / Hy) = 2 — y/y/2 . 
(b) The inverse of /(x) is f~^(y) = 2 + ]/y/2 . 


3.17. By Theorem 3.6.1, 


i: !/(«,) 


z = 1 


<i^ô. 


Choosing 8 such that ô < €/K, we get 


E l/(«,) < e, 

/ = 1 

for any given e > 0. 

3.18. This inequality can be proved by using mathematical induction: it is 
obviously true for n = 1. Suppose that it is true for n =m.To show that 
it is true for n = m + 1. For n hâve 


/ E-ia,x, \ ^ E-ia,/(x,) 



568 


SOLUTIONS TO SELECTED EXERCISES 


where Let 



Er=l 


A 


m 


Then 


m' 


, -4^ + 1 ^ 

~f\ À 

\ ^m + 1 / 


•^4 yy^ Cl yy% I 1 


But f(bj < ciifiXi). Hence, 


/ 


^m + l 


< 


^m + 1 


^T=i aifjxd 

^m + l 


3.19. Let a be a limit point of S. There exists a sequence {aJn=i ^ such 

that = a (if S is finite, then S is closed already). Hence, 

/(a) =/(lim„^^ = lim„^^/(a„) = 0. It follows that a^S, and S is 

therefore a closed set. 

3.20. Let g(x) = exp[/(x)]. We hâve that 

/[AXi + (1 -À)X2] < A/(Xi) + (1 - A)/(X2) 


for ail Xi, X 2 in D, 0 < A < 1. Hence, 

g[Axi + (l-A)x 2 ] <exp[A/(xi) + (l-A)/(x 2 )] 

<Ag(xi) + (1-A)g(x2), 

since the fonction is convex. Hence, g(x) is convex on D. 
3.22. (a) We hâve that E(\X\) >\E{X)l Lot X ^ N( fi, Then 


E{X) = 


1 


f2 


ira 


/ 

— 00 


1 


2(7 


{x- il) 


2 


xdx. 



CHAPTER 3 


569 


Therefore, 


\E{X)\< 


1 




/ 


1 


2a 


2 


{X- fl) 


X 


dx 


= E{\X\). 


(b) We hâve that E(e~^)>e~^, where fi = E{X), since e~^ is a 
convex function. The density function of the exponential distribu- 
tion with mean fi is 


1 

g{x) = —e 0<x<oo. 


Hence, 


1 

E{e-^) = - I e~^/>^e-’‘dx 

II. Jri 


fx •'0 
1 1 


1 


But 


fJL 1 1/ fJu /X+1 


1 1 

6^=1 + /xH fj? ••• H /x”+ ••• 

2! n\ 


> 1 + fl. 


It follows that 


e~^ < 


1 


/x+1 


and thus 


E{e-^)>e~^. 


3.23. I > 6) =P(|X„ I > 6^/^) ^ 0 as since converges in 

probability to zéro. 


3.24. 


hence, 



>E^[\X-^L\\, 


o->£'[|X— /x|] . 



570 


SOLUTIONS TO SELECTED EXERCISES 


CHAPTER 4 
4.1. 


lim 


m-f{-h) 

2h 


m-m , /(o)-/( 

lim h lim 

2h /i^o 2h 

/'(O) 1 f{-h)-m 

^ — lim ; 

2 2 /^^o ( —h) 

/'(O) /'(O) 


-h) 


The converse is not true: let f{x) = |x| . Then 


m -f{-h) 

2h 


= 0 , 


and its limit is zéro. But f'(0) does not exist. 

4.3. For a given e>0, there exists a ô>0 such that for any 

X 

f{x) -/(Xo) 


x—x 




0 


< €. 


Hence, 


f{x) -f{Xo) 


X —X 


0 


<|/'(^o)| + 


and thus 


|/(x) -/(Xo)| <xl|x-Xo 
where A = |/^(xo) | + c. 


4.4. g(x) =/(x + 1) — /(x) =/X^), where ^ is between x and x+1. As 

X ^ 00 ^ we hâve and f{^)^ 0, hence, g(x) ^ 0 as x ^ oo. 

4.5. We hâve that /'(l) = lim^^ (x^ — 2x + l)/(x — 1) = lim^^ — 

èx + 1 + l)/(x — 1), and thus l=2a —b. Furthermore, since /(x) is 
continuous at x = 1, we must hâve a — h 1 = —1. It follows that 
a = 3, b = 5, and 


nx) = 


_ I 3x^ — 2, X > 1, 


6x — 5, X < 1, 


which is continuous everywhere. 



CHAPTER 4 


571 


4.6. Let X > 0. 

(a) Taylor’s theorem gives 


f{x + 2h) =f{x) + 2hf'{x) + 


2 ! 




where x < ^<x + 2h, and h > 0. Hence, 


1 


nx) = —[fix + 2h)-fix)]-hf"i^), 

so that |/'(x) I < rriQ/h + hm 2 ^ 

(b) Since is the least upper bound of |/'(x) |, 


m 


0 


mi < h hrri2 . 

h 


Equivalently, 


m2h^ — fn^h + 7 ? îq >0 


This inequality is valid for ail h> 0 provided that the discriminant 


A = mf — 4mQ7?î2 

is less than or equal to zéro, that is, mj < AmQni 2 ^ 

4.7. No, unless /'(x) is continuons at Xq. For example, the function 


f(x) = 


X + 1 , X > 0 , 
0 , X = 0 , 
X — 1 , X < 0 , 


does not hâve a dérivative at x = 0, but lim^ ^ q f'(x) = 1. 


4.8. 


D'{y^)= hm 


a^y, 


= lim 

a^yj 


a -yj 

yj-a\^^i^j\yi-a\-^i^j\yi-yj 

a -y- 


which does not exist, since lim^ ^ y_\yj — a\/(a —yj) does not exist. 



572 


SOLUTIONS TO SELECTED EXERCISES 


d f{x) ^xf{x)-f{x) 
dx X * 

xf'(x) —f(x) ^ 0 as X ^ 0. Hence, by rHospitabs rule, 

r ^ 1 r 

lim — = lim 

x^o dx[ X \ x^o 2x 

= /"( 0 ). 

4 . 10 . Let Xi,X 2 ^R- Then, 

\f{Xi) -f{X2)\=\{Xi-X2)f'{è)\, 

here ^ is between x^ and X 2 * Since f'(x) is bounded for ail x, 
\f'(^)\<M for some positive constant M. Hence, for a given 6 > 0, 
there exists a ô>0, where Mô<€, such that |/(xi) ~/(^ 2 )l^^ if 
Xi — X 2 1 < ô. Thus /(x) is uniformly continuons on R. 

4 . 11 . /'(x) = 1 + cg'(x), and |cg'(x) | < cM. Choose c such that cM < In 
this case, 

cg'{x)\< i 
so 

~^<cg'(x) < I, 

and, 

k<f’{x)<l 

Hence, f'(x) is positive and /(x) is therefore strictly monotone increas- 
ing, thus /(x) is one-to-one. 


4 . 12 . It is sufficient to show that g'(x) > 0 on (0, ^): 


g'(x) = 


xf'{x) -/(x) 


X > 0. 


By the mean value theorem. 


f{x)=f{^) +xf'{c), 0<c<x 

= xf'{c). 



CHAPTER 4 


573 


Hence, 


xf'{x) -xf'{c) f'{x) -f'{c) 

g'(x) = ^ ^ ^ ^ ^ ^ ^ x>0, 


X 


X 


Since f'(x) is monotone increasing, we hâve for ail x > 0, 

f'{x)>f'{c) and thus g'(x)>0 
4.14. Let = (1 + 1/x)"^. Then, 


/ M 

logy=xlog 1+ - 

\ X ) 


log(l + 1/x) 
1/x 


Applying rHospitaPs rule, we get 


lim logy = lim 

X^oo X^oo 



1 

X 


1 


-1 




Therefore, lim ^ ^ = e. 


4.15. (a) lim^^ Q+ /(x) = 1, where /(x) = (sin x)^. 

(b) lim^^ 0 + = 0? where g(x) = ^/x. 

4.16. Let /(x) = [1 + flx + o(x)]^/^, and let y = ax + o{x). Then y=x[a + 

a(l)], and [1 + ox + o{x)Ÿ^^ = (1 -\-yY^^^^^. Now, as x ^ 0, we 

hâve y ^ 0 , (1 +yY^^ ^ and (1 +y)^d)/3' ^ gince 

o(l) 

log(l +y) ^0 as y ^ 0. 


It follows that /(x) ^ 

4.17. No, because both f'{x) and g'(x) vanish at x = 0.541 in (0, 1). 

4.18. Let g(x) =/(x) — y(x — a), Then 

Y{x)=nx)-y, 
g\a) =f{a) - 7 < 0 , 
g\h)=f{h)-y>^. 



574 


SOLUTIONS TO SELECTED EXERCISES 


The function g(x) is continuous on [a, h\ Therefore, it must achieve its 
absolute minimum at some point ^ in [a, h\ This point cannot be a or 
b, since g'ia)<0 and g'(b)>0. Hence, a<^<b, and g'(^) = 0, so 

f'(Û = y- 

4.19. Define g(x) =f(x) — r(x — a), where 

T= ÏlKf'iXi). 

i = l 


There exist C 2 such that 

max/'(x,.) =/'(c 2 ), a<C 2 <b, 

i 

min/'(x,) =/'(ci), a<c^<b. 

i 

If f{x^) =f'(x 2 ) = *** =/'(x„), then the resuit is obviously true. Let us 
therefore assume that these n dérivatives are not ail equal. In this case, 

/'(Ci) < T</'(C2). 

Apply now the resuit in Exercise 4.18 to conclude that there exists a 
point c between and C 2 such that f'(c) = r. 

4.20. We hâve that 


E [/(y ,) = E/'(c;)(y,-x,), 

i=l i=l 

where c, is between x, and y, (/ = 1,2, . . . , n). Using Exercise 4.19, 
there exists a point c in (a, b) such that 




=f'{c). 


Hence, 


E [/(y,) -f(Xi)] =/'(c) E (yi-Xi)- 

i=l i=l 


4.21. log(l+x) = E:=i(-l)"-'x”A, 


X 


< 1 , 


4.23. /(x) has an absolute minimum at x = |. 



CHAPTER 4 


575 


4.24. The function f(x) is bounded if + ax + è ^ 0 for ail x in [ — 1, 1]. Let 
A = a^~ Ab. If A <! 0, then + ax + Z? > 0 for ail x. The denominator 
has an absolute minimum at x = —a/2. Thus, if — 1 < — a/2 < 1, then 
/(x) will hâve an absolute maximum at x= —a/2. Otherwise, /(x) 
attains its absolute maximum at x = — 1 or x = 1. If A = 0, then 


/(^) 


1 




In this case, the point —a/2 must fall outside [ — I, I], and the absolute 
maximum of /(x) is attained at x = — I or x = I. Finally, if A > 0, then 


/(^) 


I 

(x-Xi)(x-X2) ’ 


where x^ = |( — a — ^[K), X 2 = \{ — a }fK). Both x^ and X 2 must fall 

outside [—1, 1]. In this case, the point x = —a/2 is equal to |(xi +X 2 ), 
which falls outside [—1,1]. Thus /(x) attains its absolute maximum at 
x = — I or x= I. 

4.25. Let H(y) dénoté the cumulative distribution function of G~nF(y)]. 
Then 


H{y)=P{G-^[F{Y)] <y) 

= P[F{Y)<G{y)] 

= P{Y<F-^[G{y)]} 

= F{F-^[G{y)]} 

= G(y). 

4.26. Let g(w) be the density function of W. Then g(w) = 2we~^^, w > 0. 

4.27. (a) Let g(w) be the density function of W. Then 


g{w) 


1 



W 


(b) Exact mean is E{W) = I.I2. Exact variance is Var(IL) = 0.42. 

(c) E{w) - I, Var(w) = 0.36. 



576 


SOLUTIONS TO SELECTED EXERCISES 


4.28. Let G(_v) be the cumulative distribution function of Y. Then 


G{y)=P{Y<y) 


= F(Z^<y) 


= P{\Z\ 


= P(_3;1/2 


where F(-) is the cumulative distribution function of Z. Thus the 
density function of Y is g(y) = \/ ^liry > 0. This represents 

the density function of a chi-squared distribution with one degree of 
freedom. 


4.29. (a) failure rate = 

(b) 


F(x-\- h) —F(x) 

wü) 


Hazard rate = 


dF(x) /dx 


1 / 1 


\ (T 


1 


_g-X/<7 


a 


(c) If 


then 


dF(x) /dx 


-log[l — F(x)] = ex + 


hence, 


l-F{x)=C2e-^\ 
Since F(0) = 0, C 2 = 1. Therefore, 


F{x) = 



CHAPTER 5 


577 


4.30. 


P(y„=r) = 


n{n — 1) {n — r ( kt\ 


r\ 


n 


1 - 


\t\ 


n 


n—r 


n{n — 1) {n — r 1) (A^) 


1 - 


n‘ 


r\ 


n I 


—r 


1 - 


Kt 


n 


n 


As 7 î ^ 00 ^ the first r factors on the right tend to 1, the next factor 
is fixed, the next tends to 1, and the last factor tends to 
Hence, 


lim P(y„ = r) = 

n ^co 


e-^\kty 


CHAPTER 5 


5.1. (a) The sequence [hXi=i is monotone increasing, since 

maxl^i, «2, . . . , < max{ai,a2, . . . , + . 

It is also bounded. Therefore, it is convergent by Theorem 5.1.2. Its 
limit is sup„^^a„, since a^<h^< sup„>^a„ for n>l. 

(b) Let di = log — log c, / = 1, 2, Then 

\ n 1 ^ 

log C„ = - X) log a^ = log c + - X) di. 


n i=i 


n i=i 


To show that (1/7 î)E"_i ^ 0 as n ^ We hâve that ^ 0 as 

i 00 . Therefore, for a given e > 0, there exists a positive integer 
such that \d^\< e/2 if i> N^. Thus, for n> N^, 


{n-N{) 


1 « 

1 

- E d, 

^ - E M,- 

n i=i 

« / = ! 


2n 


1 ^ e 

< “ X \ di\ + -■ 


n i=i 


Furthermore, there exists a positive integer N 2 such that 


N, 


1 "1 e 

-Lld,l<-, 


n , = i 


if 7î>A^2* Hence, if 7î > max(A^i, A^ 2 ^ |(1/^)^?=i I < which 
implies that {l/n)Y/l=i ^ is? log <^n log c, and c„ ^ c as 

n 


CO 



578 


SOLUTIONS TO SELECTED EXERCISES 


5.2. Let a and b be the limits of and respectively. These 

limits exist because the two sequences are of the Cauchy type. To show 
that d, where d= \ a — b\: 




a — b 


<\(a„-b„) -(a-b) 


<\a„-a\ + \b^-b 


It is now obvious that d^^ d ^ a and b^ b. 


5.3. Suppose that ^ c. Then, for a given 6 > 0, there exists a positive 
integer N such that |fl„-c|<e if n> N. Let b^ = be the nth term 
of a subsequence. Since > n, we hâve | - c | < 6 if n> N. Hence, 

b^ c. Vice versa, if every subsequence converges to c, then c, 
since is a subsequence of itself. 


5.4. If E is not bounded, then there exists a subsequence such that 

b^^^, where b^ = aj^, k^<k 2 < <^„< **’• This is a contradic- 
tion, since is bounded. 


5.5. (a) For a given c > 0, there exists a positive integer N such that 
\a^ — c\< € if n> N. Thus, there is a positive constant M such 
that \a^ — c\<M for ail n. Now, 




— c 


1 


^= 1^1 


n 


E c) 

i = l 


1 


N 


< 


1 

yn ^ 

^i = l i = l 


E «,(«,- c) 

i = l 


1 


+ 


N 

L 


1 


n 


i=V+l 


n 


a; 


üi — c 


+ 


E «/ 


n/ ^ 
^i = l i=N+l 


ü; — C 


M ^ 

< E ^ 


n 


— Ea- 

yn ^ -I yn i 

^i = l “i , = 1 ^i = l “i î=AT+l 


<M + e. 

S"= 1 


Hence, as n ^ 






CHAPTER 5 


579 


(b) Let = ( — 1)”. Then, does not converge. But (l//r)E”=i 

is equal to zéro if n is even, and is equal to —1/n if n is odd. 
Hence, (1/7 î)E”=i to zéro as n ^ oo. 

5.6. For a given c > 0, there exists a positive integer N such that 


n + 1 


-b <€ 


if 7î > A^. Since è < 1, we can choose e so that b + e < 1. Then 


‘n + \ 


<b € <1, n > N. 


Hence, for n > + 1, 


Ün+2 < ^N+l{^ + 


^N+3 ^ ^N+2 (b + e) , 


^ I \n-N-l ^N+1 , ,n 

an<aN+i(b+€) =— -^(Z?+e) . 

(b + €) 

Letting c = Uj^^^/ib + , r = b + €, we get a^<cr", where 0< 

r < 1, for 7î > A^ + 1. 

5.7. We first note that for each n, a^> 0, which can be proved by induction. 
Let us consider two cases. 

1 . è > 1. In this case, we can show that the sequence is (i) bounded 
from above, and (ii) monotone increasing: 

i. The sequence is bounded from above by }/b, that is, a^<}/b: 
a^<b is true for n = l because a^ = l<b. Suppose now that 
< b; to show that < b: 





al{3b + alf 
{3al + bf 


(b-alŸ 

(3al + bf 


> 0 . 



580 


SOLUTIONS TO SELECTED EXERCISES 


Thus + i <b, and the sequence is bounded from above by v^. 
ii. The sequence is monotone increasing: we hâve that (3b-\- 
al)/(3al-\- b)> 1, since a^<b, as was seen earlier. Hence, 
^n + i ^ for ail n. 

By Corollary 5 . 1.1(1), the sequence must be convergent. Let c be 
its limit. We then hâve the équation 

c{3b + c^) 

c = ^ , 

3c^ + b 

which results from taking the limit of both sides of = aJJ>b + 
al)/(3al-\- b) and noting that lim„^^ + i = lim„^^ = c. The 
only solution to the above équation is c = }/b . 

2. è < 1. In this case, we can similarly show that the sequence is 
bounded from below and is monotone decreasing. Therefore, by 
Corollary 5 . 1.1(2) it must be convergent. Its limit is equal to }/b . 

5.8. The sequence is bounded from above by 3, that is, a^<3 for ail n: This 
is true for n = 1. If it is true for n, then 

By induction, a^<3 for ail n. Furthermore, the sequence is monotone 
increasing: < « 2 , since a-^ = 1, û 2 = ^f3 . If then 

«„+2 = (2 + «„ + i)^'^^ > {2 + a„Ÿ^^ 

By induction, for ail n. By Corollary 5 . 1.1(1) the sequence 

must be convergent. Let c be its limit, which can be obtained by solving 
the équation 


c = (2 + c)^'^^. 


The only solution is c = 2. 

5.10. Let m = 2n. Then — a^ = l/(n + 1) + l/(n + 2) + ••• + l/2n. 
Hence, 


n 1 
— a^> — = — 
" 2n 2 


Therefore, \o-tn~^n \ cannot be made less than \ no matter how large 
m and n are, if m = 2 tî. This violâtes the condition for a sequence to 
be Cauchy. 



CHAPTER 5 


581 


5 . 11 . For m> n,WQ hâve that 

l«m - «J =l(«m -«m-l) + («m-1 -«m- 2 ) 

+ ••• +(«« + 2 ~«n+l) + («« + 1 ~«n)l 

+ ••• +fer" 

= Z?r"(l+r + r2+ ••• +r'”-”-i) 

Z?r”(l -r™-") fer” 

“ T^r ^ l-r‘ 

For a given 6 > 0, we choose n large enough such that br^/il —r)<€, 
which implies that \a^ — a^\< e, that is, the sequence satisfies the 
Cauchy criterion; hence, it is convergent. 

5 . 12 . If is bounded from above, then it must be convergent, since it is 

monotone increasing and thus = converges. Vice versa, if E“=i 

is convergent, then is a convergent sequence; hence, it is 

bounded, by Theorem 5.1.1. 



<3 

<n. 



582 


SOLUTIONS TO SELECTED EXERCISES 


Hence, 



1 \ 




î 



1 



Therefore, the sériés - 1)^ is divergent by the comparison 

test, since E“ = il//r^ is divergent for jE? < 1. 


5.15. (a) Suppose that a^<M for ail n, where M is a positive constant. 
Then 


> . 

1 + 1 + M 

The sériés E“=i is divergent by the comparison test, 

since = is divergent. 

(b) We hâve that 


a 


n 


\ ü 


= 1 - 


n 


1 

\ Ü 


n 


If = i is not bounded, then 


a 


n 


lim = 00 , hence, lim - 

n^co \ a 




n 


Therefore, = i ^«/(l + is divergent. 


5.16. The sequence {-yX=i is monotone increasing. Hence, for 7 î = 2,3, ..., 




< 





‘^n-1 
- 1 


1 


5 


n—\ 


1 



î 


E 

/ = 1 




n 

+ E 


i = 2 



< 








1 


5 


n — l 


1 



\ 


/ 


üi \ \ üy \ 



since ^ oo by the divergence of = It follows that = i is 

a convergent sériés. 



CHAPTER 5 


583 


5.17. Let A dénoté the sum of the sériés Then =A — where 

Sri = The sequence {rj ^=2 monotone decreasing. Hence, 


n 


fl,. + + ■■■ +a„_^ 

E -> = 


r 

I =m I 


m 


m 


= 1 - 


n 


m 


Since r„ ^ 0, we hâve 1 — ^ 1 as n ^ co. Therefore, for 0 < 6 < 1, 

there exists a positive integer k such that 


^w + 1 


a 


m 


+ 


m + \ 


+ ••• + 


m+k 


> €. 


^m+k 


This implies that the sériés £^=i does not satisfy the Cauchy 

criterion. Hence, it is divergent. 

5.18. (l/7î)log(l//î) = — (log n)/n ^ 0 as n ^ Hence, (l/n)^/" ^ 1. Simi- 

larly, (1 / n)log(l / n^) = — (21og/r)//î ^ 0, which implies ^ 1. 

5.19. (a) «y” = - 1 ^ 0 < 1 as 7 î ^ œ. Therefore, EAi^n is convergent 

by the root test. 

(b) < [logd + 7î)]/log = [logd + n)]/n^. Since 


.colog(l +x) 1 

I ^ dx= log(l+x) 

h X 


00 


1 •'l 


dx 


x(x + 1) 


= log2 + log| 
= 2 log 2, 


X 


x-\- 1} 


00 


the sériés DAJiogd + /r)]//r^ is convergent by the intégral test. 
Hence, EAi converges by the comparison test. 

(c) + i = (2/î + 2)(27 î + 3)/(27i + 1)^ => lim„^^ 7î + i - 1) 

= 2 > 1- The sériés EAi is convergent by Raabe’s test. 

(d) 


a 


n = ^|n 2yfn — 4n 


n 


+ 2yfn — 


n 


’sj n + 2^|n + 


1 


as n ^ 00 . 


Therefore, is divergent. 



584 


SOLUTIONS TO SELECTED EXERCISES 



a 


1/ n 


n 


= A /n 


root test. 


0 < 1. The sériés is absolutely covergent by the 


(f) = (— l)”sin(77/7î). For large n, ûniir/n) ^ tt/u. Therefore, 

= i is conditionally convergent. 


5.20. (a) Applying the ratio test, we hâve the condition 


X 

^ lim 

««+1 


n^oo 



< 1 , 


which is équivalent to 


|x| 

T 


< 1, that is, |x| < 


The sériés is divergent if |x| = ^/3 . Hence, it is uniformly conver- 
gent on [ — r, r] where r <vT. 

(b) Using the ratio test, we get 


X 

lim 

a„ + i 


n^oo 

«n 


< 1 , 


where = 10 ”//î. This is équivalent to 


10|x| < 1. 


The sériés is divergent if |x| = Hence, it is uniformly convergent 
on [— r, r] where 

(c) The sériés is uniformly convergent on [— r, r] where r < 1. 

(d) The sériés is uniformly convergent everywhere by Weierstrass’s 
M-test, since 


cos nx 


n{rr + 1) 


1 


< 


+ 1 ) 


for ail X , 


and the sériés whose nth term is l/n{n^ + 1) is convergent. 


5.21. The sériés is convergent by Theorem 5.2.14. Let ^ dénoté its 

sum: 


^ = 1 - I + 





< 1 - § + 


3 


5 _ 10 

6 12 • 



CHAPTER 5 


585 


Now, 


CO 


Z b 

n = \ 





+ 


1 1 

h 

An — 3 An — 1 


1 \ 

2n , 


Let dénoté the sum of the first 3n terms of Then, 

■^ 3/1 ^ ^2n 


where ^ 2 n the sum of the first 2n terms of and 



1 


1 


+ 


2n + 1 2n + 3 


+ ••• + 


1 

An-l' 



The sequence is monotone increasing and is bounded from 

above by since 



< 


n 

2/Î + r 


n — 1,2,... 


(the number of terms that make up is equal to n). Thus is 

convergent, which implies convergence of Let t dénoté the 

sum of this sériés. Note that 


^>(i + |-è) + a + ^-i) = i, 

since 1 /{An — 3) + 1 /{An — 1) — 1 /2tî > 0 for ail n. 
5.22. Let dénoté the nth term of Cauchy’s product. Then 


n 


Cn = Z aka„_, 


k = 0 


n 




1 


k=o [{n — k l){k 1)] 


1/2 • 



586 


SOLUTIONS TO SELECTED EXERCISES 






yt = 0 


1 

7Î + 1 



Hence, does not go to zéro as n ^ ao. Therefore, is diver- 

gent. 

5.23. /„(x) where 


f(x) 


0, x = 0, 

1/x, x>0. 


The convergence is not uniform on [0, since /„(x) is continuons on 
[0, co) for ail 7î, but f(x) is discontinuons at x = 0. 

5.24. (a) 1/n^ < 1/n^'^^. Since E“=i is a convergent sériés, 

= i l/n^ is uniformly convergent on [1 + 8,^) by Weierstrass’s 
M-test. 

(h) d/dx(l/n^)= —(\ogn)/n^. The sériés (log /r)//r^ is uni- 
formly convergent on [1 + ô, œ); 


\ogn 


n 


X 


1 log n 




X > ô + 1. 


Since {\ogn)/n^^^ ^ 0 as n ^ there exists a positive integer N 
such that 


Therefore, 


\ogn 


n 


b/2 


<1 ifn>A^. 


\ogn 


n 


X 


< 


n 


1 

1 + 5/2 


ii n> N and x > ô+ 1. But, is convergent. Hence, 

= i (log /î)//r^ is uniformly convergent on [l + ô,oo) by Weier- 
strass’s M-test. If ^(x) dénotés the sum of the sériés = i 1/n^, 
then 



n = l 


log n 



? 


X > ô + 1. 



CHAPTER 5 


587 


5.25. We hâve that 


n = 0 


n k — l\ n 

x" = 

n / 


(l-x) 


k> X ^2 ^ » » » ^ 


if — 1 <x< 1 (see Example 5.4.4). Differentiating this sériés term by 
term, we get 


E nx' 


n-il n + k - l 


n = l 


(l-x) 


k + l • 


It follows that 


E4 

n = 0 ^ 


n r — 1 


p\^-pŸ=p' 

/ 


r(l -p) 


r(l -p) 


Taking now second dérivatives, we obtain 


E n{n - l)x" 

n = 2 ' 


n-2ln+k-l\ k{k+l) 


(1 -x) 


k + 2 • 


From this we conclude that 


n = l 


n + k-l\ „ kx(l+kx) 


En" x” = 


(l-x) 


k + 2 ‘ 


Hence, 


/ I 

E 2 1 /r + r — 1 


«=o 


p^(i-p)”= 

/ 


n /^"(l +''(1 -p)] 


.r+2 


r(l — /?)(1 + /* — /p) 


5.26. 


m- 

rt = 0 

^ l 1 \ 

(««')' 

n = 0 


1 — ’ 


< 1 . 



588 


SOLUTIONS TO SELECTED EXERCISES 


The sériés converges uniformly on ( — co, where — log^. Yes, 

formula (5.63) can be applied, since there exists a neighborhood Ng(0) 
contained inside the interval of convergence by the fact that — log ^ > 0. 

5.28. It is sufficient to show that the sériés = i is absolutely 

convergent for some r> 0 (see Theorem 5.6.1). Using the root test, 


P = lim sup 


/ i/« 


n ->co 


(n!) 


1/n 


= re lim sup 




\l/n 


->co 


(2t7) 


l/2« l + l/2« 


n 


= re lim sup 




\l/n 


n ->co 


n 


Let 


m = lim sup 




\l/n 




n 


If m > 0 and finite, then the sériés £^=i( is absolutely 
convergent if p < 1, that is, if r< l/(em). 


5.30. (a) 


n 


E(x/= j:k\i p/i-p) 


k = 0 


n 


k 


n—k 


=plLk\l p’^-\i-p) 


k = l 


n — \ 


k 


n — k 


=p E (*+ 1) 


^ = 0 


n — 1 


n 

Æ + 1 


p\i-p) 


n—k — l 


ni 


^^0 k\{n-k-l)\ 


p/l-p) 


n—k—l 


= np(p + l-p) 
= np. 


n — 1 


We can similarly show that 


E{X^)=np(l -p) +n^p^. 



CHAPTER 5 


589 


Hence, 


Var(X„)=£(X„2)-„V 

Waï{X^)=np{l-p). 


(b) We hâve that P{\Y^ 
p)/n. Then 


>ra) < 1/r^. Let ra= e, where =p(l ~ 




p(^-p) 


(c) As P(|y„|>e) 

probability to zéro. 


0, which implies that converges in 


5.31. (a) 


tx 

^n{t)={Pne‘ + qn) , qn=^~Pn 

= [l+;^„(e' -!)]"• 


Let np„=r„. Then 


1 n 


(kniO = 


exp[ p,{e‘-l)] 


as /î ^ CO. The limit of c^>„(0 is the moment generating function of a 
Poisson distribution with mean /jl. Thus has a limiting Poisson 
distribution with mean /x. 


5.32. We hâve that 


Q = (l„ - Hj + ÆH 2 - Ÿ 

= (I„ - Hi) + - 2Æ^H4 + . 


Thus 


y'Qy = y'(I„ - Hi)y + k^y'K^y - 2k^y'K^y + ••• 
= 55^ + 

00 

= 55,,+ E(^-2)(-*)'-'5,. 

i = 3 



590 


SOLUTIONS TO SELECTED EXERCISES 


CHAPTER 6 

6 . 1 . If f(x) is Riemann intégrable, then inequality (6.1) is satisfied; hence, 
equality (6.6) is true, as was shown to be the case in the proof of 
Theorem 6.2.1. Vice versa, if equality (6.6) is satisfied, then by the 
double inequality (6.2), S(P,f) must hâve a limit as ^ 0, which 
implies that f(x) is Riemann intégrable on [a, b]. 

6 . 2 . Consider the function f(x) on [0,1] such that f(x) = 0 if 0<x< f, 
f(x) = 2 if x=l, and /(x) = E”=o 1/2' if (n l)/(n 2) <x < 
(n + 2)/(n + 3) for n = 0, 1, 2, . . . . This function has a countable num- 
ber of discontinuities at |, f , f , . . . , but is Riemann intégrable on [0, 1], 
since it is monotone increasing (see Theorem 6.3.2). 

6 . 3 . Suppose that f(x) has one discontinuity of the first kind at x = c, 
a<c<b. Let lim^^^-/(x) =L^, lim^^^+/(x) =L 2 , L^#L 2 . In any 
partition P of [a, b\ the point c appears in at most two subintervals. 
The contribution to USp(f) — LSp(f) from these intervals is less than 
2(M — m)à.p, where M,m are the supremum and infimum of /(x) on 
[a, b], and this can be made as small as we please. Furthermore, 
E/Mj Ax, — E//7Î/ Ax, for the remaining subintervals can also be made 
as small as we please. Hence, US pif) — LSp(f) can be made smaller 
than € for any given 6 > 0. Therefore, /(x) is Riemann intégrable on 
[a, b]. A similar argument can be used if /(x) has a finite number of 
discontinuities of the first kind in [a, b]. 

6 . 4 . Consider the following partition of [0,1]: P = [0,l/2n,l/(2n — 1), 
. . . , 1 /3, 1 /2, 1}. The number of partition points is 2/^ + 1 including 0 
and 1. In this case. 


2n 


E|A/, 


1 


2n 


cos Trn — 0 


+ 


1 


2n-l 


cos 


■ TT 

1 


COS TTU 
2n 


+ 


1 1 

COs[7r(7î — 1)] — 


2n — 2 


2n-l 


cos 




+ 



1 

1 

f 3'Tr 'i 



/ 7T\ 

1 

+ 

— cos TT — 
2 

—cos 

3 

i 2 


+ 


cos 77 

2 


1 

— + 

n n — 1 n — 2 


1 1 

+ T + ••• +1. 



CHAPTER 6 


591 


As E^”i| A/- 1 ^ 00 , since E“ = il/n is divergent. Hence, f(x) is 

not of bounded variation on [0, 1]. 

6.5. (a) For a given e > 0, there exists a constant M > 0 such that 


if x>M. Since gXx) > 0, 

|/'(x) -Lg'{x)\< €g’{x). 
Hence, if and A 2 are chosen larger than M, then 




(b) From (a) we get 


/(A2) -Lg{\ 2 ) -/(Al) +Lg(Ai)| <e[g(À2) -g(Ai); . 


Divide both sides by g(A 2 ) [which is positive for large A 2 , since 
g(x) ^ 00 as X ^ we obtain 


/(A2) ^ /(Al) 

g(Az) g(A2) 


+ L 


g(Ai) 

^(Az) 


< € 

\ g(Ai)' 


g(A2) 


< 


Hence, 


/(A 2 ) 

^(A2) 


|/(Ai)| ^ ^ g(Ai) 
g(A2) g(A2)‘ 


(c) For sufficiently large A 2 , the second and third terms on the 
right-hand side of the above inequality can each be made smaller 
than c; hence. 


/(A 2 ) 

g(A2) 


< 3e. 


6.6. We hâve that 


mg{x) <f{x)g{x) <Mg{x), 



592 


SOLUTIONS TO SELECTED EXERCISES 


where m and M are the infimum and supremum of f{x) on [a,h\ 
respectively. Let ^ and rj be points in [a, b] such that m=/(^), 
M=f(r]). We conclude that 




j^f{x)g{x) dx 





By the intermediate-value theorem (Theorem 3.4.4), there exists a 
constant c, between ^ and 17, such that 


fy(x)g(x) dx 
f^g(x) dx 

''a 


=/( 0 - 


Note that c can be equal to ^ or 77 in case equality is attained at the 
lower end or the upper end of the above double inequality. 

6.9. Intégration by parts gives 


rf(x)dg(x) =f{b)g{b) -f{a)g{a) + f^ix) d[-f{x)] . 

"'a '^a 


Since g(x) is bounded, f(b)g(b) ^ 0 as b Let us now establish 
convergence of j^gix) d{—f{x)\ as b^^: let M>0 be such that 
|g(x)| <M for ail x>a.ln addition, 

fW[-/(x)] =Mf{a)-Mf{b) 

a 

-^Mf(a) as è ^ 00. 

Hence, fyMd[—f(x)] is a convergent intégral. Since —f(x) is monotone 
increasing on [a, 00), then 


/ CO 00 

\g{x)\d[-f{x)] < f Md[-f{x)]. 

U ''a 


This implies absolute convergence of /“ g{x) d[—f(x)]. It follows that 
the intégral fyf(x)dg(x) is convergent. 



CHAPTER 6 


593 


6 . 10 . Let /î be a positive integer. Then 


^niT 

•^0 


sin X 


X 


.Trsinx . 277 - sin X 

dx= / dx— / dx 

•^0 X 


X 


1 rn 

+ ...+(-i )-7 

hn 


nir Sin X 


dx 


— 1)77 ^ 


= fsi 

•'O 


sin X 


1 


1 

- + 

X X + 77 


1 


+ ••• + 


X + (/î — 1)77 


dx 


1 1 M 

> I h T h ••• H I j sin xdx 


77 2t7 


HTT / •'O 


2/1 1 
= _ l+_ + ...+- 

77 \ 2 n 


00 as n ^ 00 


6.11. (a) This is convergent by the intégral test, since (\og x)/(x}/x) dx = 
4fQ6~^xdx = 4 by a proper change of variable. 

(b) (77 + 4)/(2t 7^ + 1) 1/2/7^. But E“=i 1//7^ is convergent by the 
intégral test, since dx/x^ = 1. Hence, this sériés is convergent by 
the comparison test. 

(c) 1/Un + 1 — 1) ^ 1/ }/n . By the intégral test, 1/V^ is diver- 
gent, since j^dx/^fx =2Vx|^ = 00 . Hence, E“=i 1/Un + 1—1) is 
divergent. 


6 . 13 . (a) Near X = Owe hâve x^“Hl and /q x^“^ tZv = 1//77. 

Also, near x=l we hâve x^~^(l - xY~^ ^ (1 — xY~^, and 
/oHl — x)””^ dx = 1/n. Hence, B(m, n) converges if m > 0, /7 > 0. 

(b) Let Vx = sin Then 


TT 


f x^ ^(1— x)” ^ dx = 2( 
h •'0 




(c) Let x= 1/(1 +y). Then 


/ x "‘-\ i - xy -^ dx = f - 

•'o •'0 (1 


y 


rt — 1 


+y) 


m +n 


dy. 


Letting z = 1 — x, we get 


/x'"-i(l-x)" ^ dx= - Cy-z)"" ^z’'-^a 

= ( x"“^(l —x)^ ^ dx. 

Jn 



594 


SOLUTIONS TO SELECTED EXERCISES 


Hence, B{m, n) = B{n, m). It follows that 


X 


m — l 


5 ( 772 , ~ , m+n dx ‘ 

•'O (1+x) 


(d) 


X 


n — l 


B{m,n) = ( — 

•'O ( 1 


( 1 +x) 


m +n 


dx 


X 


n — l 




(l+x) 

Let y = 1/x in the second intégral. We get 


(l+x) 


/. 


00 X 


n — l 


1 3 ^ 


m — l 


(l+x) 


m +rt 


dx= f — 

Jo n 


(i+y) 


m +n 


dy 


Therefore, 


B{m,n) 


„n-l 

= C ^ dx+ f' — 

‘o (1+x)'"^" h n 


X 


m — l 


(l+x) 


m+n 


dx 






0 (l+x) 


m +n 


2 ir. 


6.14. (a) 


dx ç\ dx dx 

•^0 *^0 l/l + r ^ *^1 


/T 


+ X' 


'o /T 


+ X' 


vT 


+ X' 


The first intégral exists because the integrand is continuons. The 
second intégral is convergent because 


r 

^1 


dx 


vT 


< 


+ X' 


^00 dx 

A 


= 2 . 


Hence, /q^Zx/i/i +x^ is convergent. 

(b) Divergent, since for large x we hâve 1/(1 -\-x^y^^ ^ 1/(1 +x), and 
Jq dx/{l +x) = 00. 

(c) Convergent, since if /(x) = 1/(1 —x^Y^^ and g(x)= 1/(1 —xY^^, 
then 


lim 


f(x) 


l \l /3 

/-^T- g(x) (3 


But /oV(x)d[r= |. Hence, /oY(x)rfr is convergent 



CHAPTER 6 


595 


(d) Partition the intégral as the sum of 



and / 

•'i 


dx 

X (1 + 2x) 


The first intégral is convergent, since near x = 0 we hâve 
l/[Vx(l +x)] ^ l/^^x, and Jq dx/}/x =2. The second intégral is 
also convergent, since as x ^ oo we hâve 1 /[}/x (1 + 2x)] ^ 1 /2x^^^, 
and /“rfr/x^/^=2. 


6 . 15 . The proof of this resuit is similar to the proof of Corollary 6.4.2. 


6 . 16 . It is is easy to show that 


h'„(x) 


(«-!)! 



Hence, 


h„{a)=h„{b)- fh'„{x)dx 


1 


(tî - 1)! Ja 

since hj^b) = 0. It follows that 


j'"{b-xY Y^"\x)dx, 


n — 1 


f{b) =f{a) + (b-a) f'(a) + ••• + 


(n-1)! 


1 


+ 


(«-!)! 


f (b—xy ^ f^^\x) dx. 

^ a 


6 . 17 . Let G(x) = j^g{t)dt. Then, G(x) is uniformly continuons and G'(x) 
g(x) by Theorem 6.4.8. Thus 

f^f(x)g(x) dx =f(x)G(x)t - f‘’G{x)f'{x) dx 

''a a 

=f{b)G{b)- ÇG{x)f\x)dx. 

''a 

Now, since G(x) is continuons, then by Corollary 3.4.1, 


G(|)<G(x)<G(t7) 



596 


SOLUTIONS TO SELECTED EXERCISES 


for ail X in [a, b], where j] are points in [a, h] at which G(x) achieves 
its infimum and supremum in [a, b], respectively. Furthermore, since 
f(x) is monotone (say monotone increasing) and f'(x) exists, then 
f'(x) > 0 by Theorem 4.2.3. It follows that 


G{^) j f'{x) dx < j G{x)f{x) dx <G{r]) j f^x) dx. 


This implies that 


G(x)f\x) dx = X f f'(x) dx, 

^ a 

where G(^) < A < GC??). By Theorem 3.4.4, there exists a point c 
between ^ and r] such that A = G(c). Hence, 



f G(x)f'{x)dx = G(c) f f'(x) dx 

^ a ^ a 


= [f{b) -f{a)] fj{x)dx. 


Consequently, 


Cf{x)g{x)dx=f{b)G{b) - fV(x)/'(x) dx 

a a 

=f(b) fg{x)dx- [f{b) -f(a)] f g{x) dx 

'^a "'a 

=f(a) f g(x) dx+f(b) G(x) dx 

a •'c 

6 . 18 . This follows directly from applying the resuit in Exercise 6.17 and 
letting f(x) = 1/x, g(x) = sinx. 

./,sinx 1 .c 1 

/ dx = — f sin xdx + — / sin xdx 

J ^ Y n J ^ h 


1 1 

= —(cos a — cos c) H — (cos c — cos b ) . 
a b 


rb sin X 

1 


1 


/ dx 

< — 

cos a — cos c 

+ - 

cos c — cos b 

X 

a 


a 



4 

< — . 
a 


Therefore, 



CHAPTER 6 


597 


6.21. Let 6 > 0 be given. Choose t]> 0 such that 

v[g(b) -g(«)] < e- 

Since f(x) is uniformly continuous on [a, b] (see Theorem 3.4.6), there 
exists a 5 > 0 such that 

|/(x) -f{z) \ < 7] 

if \x — z\<d for ail x, z in [a, h\ Let us now choose a partition P of 
[a, h] whose norm is smaller than ô. Then, 

n 

USp{f,g) -LSp{f,g) = E 

i = l 
n 

i = l 

= v[g{b) -g(a)] 

< €. 

It follows that f(x) is Riemann-Stieltjes intégrable on [a, b]. 


6.22. Since g(x) is continuous, for any positive integer n we can choose a 
partition P such that 


^8i = 


g{b) -g{a) 


1 / f , , » » . , /r . 


Also, since f{x) is monotone increasing, then =/(x,), =/(x^_i), 

/ = 1, 2, . . . , 7î. Hence, 

g(b) —gia) " 

US.U^g) -LS,if,g) = — ^ X [f(xd -/(x,_i)] 


1=1 


g{b) -g{a) 


[f(b) -/(«)]■ 


This can be made less than e, for any given c > 0, if n is chosen large 
enough. It follows that f(x) is Riemann-Stieltjes intégrable with re- 
spect to g(x). 


6.23. Let /(x) =x^/(l +x^), g(x)=x^“^. Then lim^^^/(x)/g(x) = 1. The 
intégral fg x^~^ dx is divergent, since jg x^~^ dx=x^~^/(k — 1) j” = œ 
if Æ>1. If Æ=I, then fg x^~^ dx = fg dx/x= By Theorem 6.5.3, 
fgf(x)dx must be divergent if Æ> I. A similar procedure can be used 
to show that f^^f(x)dx is divergent. 



598 


SOLUTIONS TO SELECTED EXERCISES 


6.24. Since E{X) = 0, then Var(X) =E{X^), which is given by 


E{X^)=f 




dx. 




Let U = e^/(l + e^). The intégral becomes 


E(x^)=y 

•'O L 


logl 


U 


1 2 


1—U 


du 


= log 


U 


[m log U (1 —u) log(l — 


-l 


1 — U ) 

1 [m log U (1 —u) log(l — u)] 


U 



0 u(l — u) 

log U ^llog(l-w) 


du 


= — / du — / 

Jq 1 — u Jq 

fl logu 

= —21 du . 

•^0 l-u 


du 


u 


But 


1 

0 



1 logw 
1—U 



1 u u^ ••• +m"+ ••• )\ogudu, 


and 


fl 1 

/ u^\ogudu = u^^^logu 

Jq n 1 


0 


1 


(n + 1) 


2 • 


Hence, 



ri logu ” 

; du= - 

■’o i-u 


1 


TT 


=0 (n + 1 ) 


2 


6 ’ 


and £(X^)= 7 tV3. 

6.26. Let g(x) =/(x)/x“, h(x) = l/x“. We hâve that 

lim y ^ =/(0) < 
h{x) ’ 



CHAPTER 6 


599 


Since jQh{x)dx= "/(l — a) for any ô > 0, by Theorem 6.5.7, 
Jq f(x)/x'^ dx exists. Furthermore, 


Hence, 


exists. 



1 



6.27. Let g(x) =/(x)/x^"^“, h(x)= 1/x. Then 



six) 

h{x) 


= Æ < 00. 


Since fQh(x)dx is divergent for any ô>0, then so is fQg(x)dx by 
Theorem 6.5.7, and hence 


E[X + = j^g(x)dx-\- j g{x)dx = 


= 00 


6.28. Applying formula (6.84), the density fonction of W is given by 


g{w) = 



2 V — (rt + 1 )/2 
\ 


n 


! 


? 


W > 0. 


6.29. (a) From Theorem 6.9.2, we hâve 


P{\X — i±\ > ua) < 


1 


U 


2 • 


Letting uo- = r,wc get 


P(|X-/x|>r)< 


O* 


2 


.2 • 


(b) This follows from (a) and the fact that 


£(X„) = fji, Var(X„) = 


(T 


n 



600 


SOLUTIONS TO SELECTED EXERCISES 


(c) 


P(|x„-/x|>e)< 


a 


2 


ne 


2 • 


Hence, as n ^ P( — /x >6)^0. 

6.30. (a) Using the hint, for any u and > 0 we hâve 

u^Vk-i + 2uvvy. + > 0. 

Since ^k-i > 0 for Æ > 1, we must therefore hâve 

v'^vl-v'^Vk.iVk+i<Q, k>l, 

that is, 

^ ^k-l ^k+l î Æ=l,2, — 1. 

(b) Pq = 1‘ For k=l, Vi < V 2 ‘ Hence, Vi<vY^. Let us now use 
mathematical induction to show that for ^ for 

which and p^^-^ exist. The statement is true for n = 1. Suppose 
that 


P, 


n — 1 


< P. 


1/ n 


— n 


To show that p^^^ < We hâve that 


Thus, 


Hence, 




P, 


< 


(n+1) / n 


(n-l)/n 


P, 


n 


P, 


1/ n 


n 


^ Vn + l 


n + 1 


^ ^n + 1 


l/(n + l) 


CHAPTER 7 


7.1. (a) Along = 0, we hâve /(O, X 2 ) = 0 for ail X 2 , and the limit is zéro 
as X ^ 0. Any line through the origin can be represented by the 
équation = 1 X 2 - Along this line, we hâve 


f{tX2,X2) = 


X 2 ^ 0. 


tX2 

- 1 - 

t 

^2 

\ 


exp 


2 


^2 

l 

^2 

/ 


X 


2 


exp 


X 


2 


Using rHospitahs rule (Theorem 4.2.6), /(tX 2 , X 2 ) ^ 0 as X 2 ^ 0. 



CHAPTER 7 


601 


(b) Along the parabola x-^ =x|, we hâve 




X 


X 


exp 


X 


X 


2 


\ 


! 


= e \ X2^0, 


which does not go to zéro as %2 ^ 0. By Définition 7.2.1, /(x^, X 2 ) 
does not hâve a limit as x ^ 0. 


7.5. (a) This function has no limit as x ^ 0, as was shown in Example 7.2.2. 
Hence, it is not continuons at the origin. 

(b) 


àf(x^,X2) 

X 

II 

0 

II 

> 

0 

f 1 

1 

0 

> 

k 

-o|) 

dXi 

( Axi 

( Ax^)^ + 0 

df{x^,X2) 

= lim ^ 

^ AX2^0 

x = 0 ^ 

( 1 

OAX 2 

-"Il 

dX2 

( AX2 

0 + (AX2)^ 


7.6. Applying formula (7.12), we get 


dt 





where is the ith element of u = tx, / = 1, 2, . . . , k. But, on one hand. 


^/(u) 

âXi 


^ * df(u) dUj 

““1 àUj dXi 


= t 


dfi'o) 

dU; 


and on the other hand. 


Hence, 


It follows that 


^/(u) „ '^/(x) 

. 

dXi dX^ 

du^ dXi 



i = l 


'?/(u) 

dUl 


= n?"-i/(x) 



602 


SOLUTIONS TO SELECTED EXERCISES 


can be written as 


^ âf(x) 

i=i 


that is, 


LXi—— =nf{x). 


i=l 


dX: 


7.7. (a) /(xi, X2) is not continuous at the origin, since along = 0 and 
X2 ^ 0, f(xi, X2) = 0, which bas a limit equal to zéro as X2 ^ 0. But, 
along the parabola X2 =xf, 




if Xi 0. 


Hence, lim^^ ^ of(^u ) = è ^ 0- 

(b) The directional dérivative in the direction of the unit vector v 
02 )' at the origin is given by 


7(x) 


-\-V' 


^/(x) 


1 x = 0 


2 x = 0 


This dérivative is equal to zéro, since 


^/(x) 1 (Axi) 0 

= lim - — 7 0 = 0, 

<?Xi Ax.^o Ax^ (Axi) +0 


= lim 


OAx. 


^/(x) 


dX2 AX2 0 + (AX2) 


- 0 = 0 . 


7.8. The directional dérivative of / at a point x on C in the direction of v is 
^f=i^î àf(x)/âx^. But V is given by 




CHAPTER 7 


603 


where g is the vector whose ith element is gi(t). Hence, 

1 




dg,/ dt \ ^ ^ dt 


dgi{t) df{x) 


dX: 


Furthermore, 


ds 

dt 


dg 


dt 


It follows that 


^ dfi^) 

L Vi 

i = l 


dX; 


^ Æ dgj{t) df{x) 

ds dt dXi 

^ dXj df{x) 
ds dXi 

^ #(x) 

ds 


7.9. (a) 


f{Xi,X2) 


d d 

/(0,0) + I Xi +x 


dX 


•2 


dX- 


/( 0 , 0 ) 


1 

H 

2 ! 


d 


d 


2 


d 


2 \ 


X 


dx 


2 + 2X^X2 
1 


dX-^ dX2 


+ X 


^ ^7x? 


/( 0 , 0 ) 


= 1 +X1X2. 


(b) 


/(Xi,X2,X3) =/(0,0,0) + 


Y, Xi 


d 


dX: 


\i=l 


/( 0 , 0 , 0 ) 


1/3 â ] 


/( 0 , 0 , 0 ) 


1 


= sin(l) +Xi cos(l) + — {xi [cos(l) — sin(l)] + 2x|cos(l)} 


(c) 


d ^ 

f(Xi,X2) =/(0,0) + I Xi— +X2 


dx 


dx- 


/( 0 , 0 ) 


+ 


= 1 , 


1 


d^ d^ 

X? — 7 + 2X1X2 

dxl dXidX2 


d 


+ X 


"^x| 


/(0,0) 


since ail first-order and second-order partial dérivatives vanish at 

X = 0. 



604 


SOLUTIONS TO SELECTED EXERCISES 


7.10. (a) If u^=f(xi,X 2 ) and df/âx^^O at Xq, then by the implicit 
function theorem (Theorem 7 . 6 . 2 ), there is neighborhood of Xq in 
which the équation =f(^v ^2) solved uniquely for x^ in 

terms of X2 and that is, = h(u^, X2). Thus, 

/[/!(«!, X2), X2] =U^. 

Hence, by differentiating this identity with respect to X2, we get 

df dh df 

h — 0 , 

dX-^ dX 2 dX 2 


which gives 


dh df / df 





in a neighborhood of Xg. 

(b) On the basis of part (a), we can consider gCx^, X2) as a function of 
X2 and U, since x^ =h(u^,X2) in a neighborhood of Xq. In such a 
neighborhood, the partial dérivative of g with respect to X2 is 

dX^ dX2 dX2 d(Xi,X2) / dx^' 

which is equal to zéro because ^(/, X2) = 0 . [Recall that 

dh/dX2= —{df/dX2){dh/dxfi\. 

(c) From part (b), g[h{u^,x 2)^X2] is independent of X2 in a neighbor- 
hood of Xq. We can then write =g[/z(w^, X2), X2]. Since 

x^ =h(u^,X 2 ) is équivalent to =/(x^,X2) in a neighborhood of 
Xq, (j)[f(xi, X 2 )] =g(xi, X 2 ) in this neighborhood. 

(d) Let G(/, g) =g(xi, X2) — </)[/(xi, X2)]. Then, in a neighborhood of 
Xq, G(/,g) = 0 . Hence, 

dG df dG dg 

— h = 0 , 

df dX^ dg dX-^ 

dG df dG dg 

— — h = 0 . 

df dX2 dg dX2 

In order for these two identifies to be satisfied by values of 
dG/df, dG/dg not ail zéro, it is necessary that d(f, g)/d(x^, X 2 ) 
be equal to zéro in this neighborhood of Xq. 



CHAPTER 7 


605 


7 . 11 . m(xi, X2, X3) = m( ^3, ^2 ^3)- Hence, 


so that 


du 

du dx^ 

du 

dX2 

du 

dX^ 


= + 


+ 



0*^3 

dx^ d^2 

dX2 

^^3 

dX2 



du 

du 

du 




■ to, 


+ 

5 



dX2 

dX2 




du 

du 

du du 


= èi I 3 — 

+ ^2 ^3 . + ^3 . 


dX^ 

0^X2 dX^ 


du 

du du 


= Xi hXo \-x-i =nu. 

dX^ dX 2 dXo, 


Integrating this partial differential équation with respect to ^3, we get 

\ogu=n\og ^3+ 

where ^2) ^ function of ^1,^2- Hence, 


U = 


^ 3 "exp[(//(^i,^ 2 )] 

Xi X 2 ' 


= x 1 F 


X^ X 


3 ! 


where ^2) = exp[(//(|i, ^2)]- 


7 . 13 . (a) The Jacobian déterminant is 21 xlx\xl, which is zéro in any subset 

oi that contains points on any of the coordinate planes. 

(b) The unique inverse function is given by 

Xi=uY^, X2 = uY^, X2=uY^. 

7 . 14 . Since ^2) solve for X2 uniquely in terms 

of yi and 3^2 by Theorem 7 . 6 . 2 . Differentiating and g2 with respect 
to y^, we obtain 


dgi 

dx^ 

9gi 

dX2 

9gi 

= 0, 


+ 



+ 

dXi 

syx 

dX2 

dy^ 

dyx 


9gi 

dXi 

dgi 

8 X 2 

dgi 

= 0, 


+ 



+ 

dXi 

dyi 

dX2 

dy^ 

dyi 



Solving for rlXi/dy^ and 8X2/ ày^ yields the desired resuit. 



606 


SOLUTIONS TO SELECTED EXERCISES 


7.15. This is similar to Exercise 7.14. Since d(f, g)/â(xi, X 2 ) ^ 0, we can 
solve for X 2 in terms of X3, and we get 


But 


cki 

dx^ 

dx2 

dx^ 


d{f,g) / d{f,g) 

d{x^,X2) I ^{X■^,X2)’ 

^(Xj,X3) j d{x^,X2) ' 


d{f,g) ^ d{f,g) d{f,g) 

d{X2,X2) ^(X2,X3)’ ^(Xi,X3) 


d{f,g) 

^»(X3,Xi) ■ 


Hence, 


dx^ 

d{f,g)ld{X2,X2) 


dx^ 

d{f,g)ld{x^,X2) 


dx2 

d{f,g)ld{x2,x^) ■ 


7.16. (a) ( — is a point of local minimum. 

(b) 

àf 

= 4aXi —Xj + 1 = 0, 

âx^ ^ ^ 

= — Xi + 2x9 — 1 = 0. 

dX2 


The solution of these two équations is x^ = 1 /(I — Sa), X2 = (4a — 
l)/(8a — 1). The Hessian matrix of / is 




Here, = 4a, and det(A) = 8 a — 1. 

i. Yes, if a > |. 

ii. No, it is not possible. 

iii. Yes, if a < |, a + 0. 

(c) The stationary points are (-2,-2), (4,4). The first is a saddle 
point, and the second is a point of local minimum. 

(d) The stationary points are (V^, — ^/2), (— ^/2,^/2), and (0,0). The 
first two are points of local minimum. At (0,0), /n/22 ~fu ^ 0- 



CHAPTER 7 


607 


this case, h'Ah has the same sign as /u|(o,o) = —4 for ail values of 
h = (h^, h 2 )\ except when 


^ Vil 

or hi~h 2 = 0, in which case hAh = 0. For such values of h, 
(h^V)^/(0,0) = 0, but (h'V)Y(0,0) = 24(/z^ + /z^) = 48/zî, which is 
nonnegative. Hence, the point (0, 0) is a saddlepoint. 

7.19. The Lagrange équations are 

(2 + 8A)x^ + 12^2 = 0, 

I2xi + (4 + 2A)x2 = 0, 

4xf +x| = 25. 

We must hâve 


(4A+ l)(A + 2) -36 = 0. 


The solutions are A^ = —4.25, A 2 = 2. For A^ = —4.25, we hâve the 
points 

i. Xi = 1.5, X 2 = 4.0, 

ii. Xi= —1.5, %2 = -4.0. 

For A 2 = 2, we hâve the points 

iii. Xi = 2, X 2 = —3, 

iv. Xi= —2, X 2 = 3. 

The matrix has the value 


B 



2 + 8 A 
12 
8X1 


12 

4 + 2A 
2 x 2 


8X1 

2x2 . 

0 


The déterminant of B^ is A^. 

At (i), = 5000; the point is a local maximum. 

At (ii), A^ = 5000; the point is a local maximum. 

At (iii) and (iv), A^ = —5000; the points are local minima. 



608 


SOLUTIONS TO SELECTED EXERCISES 


7.21. Let F =xlx\xl + \{xl +x| +x| — c^). Then 

clF 


= x\xl + 2Axi = 0, 


dXi 

âF 

âXi 

âF 

âX^ 


= 2xfx2xj + 2Ax2 = 0, 


= 2xlxlx^ + 2A%3 = 0, 


xf +x? +x? = c^. 


•2 ' -^3 


Since x^X2,X3 cannot be equal to zéro, we must hâve 

0 , 

0 , 

0. 


x|x| + A 
x^xj + A 
x^xl + A 


These équations imply that xl=xl=xj = c^/3 and A= —c'^/9. For 
these values, it can be verified that < 0, A2 > 0, where A^ and A 2 
are the déterminants of and B2, and 


B. = 


B.= 


2x|x| + 2A 

4 XiX 2 x| 

4 x 

IX|X 3 

2x 

4 xiX 2 x| 

2x^x1 + 2A 

4X1X2X3 

2x 

4 xix|x 3 

4X1X2X3 

2xix| + 2A 

2x 

2x^ 

2x2 

2x3 

0 

2x^x1 + 2A 

4X1X2X3 

2x2 



4X1X2X3 

2x^x1 + 2A 

2x3 

• 


2x2 

2x3 

0 




Since n = 3 is odd, the function x^x^xj must attain a maximum value 
given by (c^/3)^. It follows that for ail values of X2, X3, 


x^xjxj < 




that is. 


/ 2 2 2\^/^ ^ 
(x(x|x|) < — 


X? +x| +x| 


3 



CHAPTER 7 


609 


7 . 24 . The domain of intégration, Z), is the région bounded from above by the 
parabola X 2 = 4 x^ —x^ and from below by the parabola X 2 =x^. The 
two parabolas intersect at ( 0 , 0 ) and ( 2 , 4 ). It is then easy to see that 


f^l ^Y(Xi,X2)dX2 dx^=r\f^''" f(Xi,X2)dx^ 

•'O rx? •'O r2--,/4-x. 


v 2 

Xi 


'V^ 


’ 2 -^ 


dx 2 • 


7 . 25 . (a) 1 = f{x^,X 2 )dx^}dx 2 ‘ 

(b) dg/dx^ = filxj dx 2 — 2 xif(xi,l—xl)+f(xi,l—xi). 


7 . 26 . Let U =xl/x^, v=xl/x 2 ‘ Then uv=XiX 2 , and 


But 


Hence, 




à(x^,X2) 

d(u, v) 




â(x^,X2) 


(9(u,a) 

â(u,v) 


à(Xi,X2) 


1 

3 



3 

» 

4 


7 . 28 . Let I(a) = dx/(a +x^). Then 




da 


=T 

Jn 


dx 


0 (a-\-x^) 


3 • 


On the other hand, 




Arctan 





—a Arctan 
4 



+ 


4 a^(a + 3 ) 


]/3 ( 2 a + 3 ) 

2 a^(a + 3)^ 



610 


SOLUTIONS TO SELECTED EXERCISES 


Thus 


ry/3 ^ 

■'o (a+x^f 


3 /T /3 1 

—a ^/^Arctanu — H 7- 

8 ^ a S a^(a + 3) 


}/3 (2a + 3) 

4 a^(a + 3) 


Putting a = 1, we obtain 


^ fT dx 3 ^ 7^/3 

^ = -ArctanV3 + 

0 (1+x^) S 64 


7.29. (a) The marginal density functions of and X 2 are 


fl{Xx) = j\xx+X2)dx- 


= Xi + I, 0 <Xi < 1, 


f2{X2)= f\xi+X2)dXi 


= X2 + 0 <%2 < 1, 


/(Xi,X2) ^/i(Xi)/2(X2). 

The random variables X^ and X 2 are therefore not independent. 


= r f +^ 2 ) dx^ dx2 

Jn J c\ 


0 •'0 


7.30. The marginal densities of X^ and X 2 are 

( 1 +x^, — 1 <x^ < 0, 

f\{xi) = \l-x^, 0<Xi<l, 

1,0, otherwise, 

72(^2) =2X2, 0<X2<1. 



CHAPTER 7 


611 


Note that f(xi,X 2 ) ^/i(xi)/2(x2). Hence, and X 2 are not indepen- 
dent. But 


dxy = 0 , 


ri r ^2 

E{X^X2) = ; / x^X2(bc^ 

•'O ■^-X2 


0 X 

£'(Xi) = / Xi(l +Xi) iici + / Xi(l — Xi) = 0, 
7-1 Jq 

E(X2) = f 2x| <ic2 = I . 

•'O 


Hence, 


E(XiX 2 )=E(X 0 ^(X 2 )= 0 , 


7 . 31 . (a) The joint density function of and 1^2 is 


giy^yi) = 


1 


r(«)r(/3) 


yrHi-yi)^~'y 2 ^^~'e-^^ 


(b) The marginal densities of and 1^2 are, respectively. 


^i(yi) = 


r (« + / 3 ) ^-1 

■yi (i-yi) 


T{a)T{l3) 


0 <yj < 1 , 


B(a, P) 

R?(V'?) = 0 <v 9 <oo. 


(c) Since 5(a, )S) = r(a)r( /3)/r(a + /3), 

^(^ 1 .^ 2 ) =gi{yi)g 2 {y 2 )^ 

and Y^ and Y 2 are therefore independent. 

7 . 32 . Let U=X^, W = X^X 2 . Then, X^ = U, X 2 = W/U. The joint density 
function of U and W is 


g{u,w) = 


lOvp 


U 


2 ’ 


w 


<u < yfw , 0 < W < 1 . 



612 


SOLUTIONS TO SELECTED EXERCISES 


The marginal density function of W is 

gi(w) = — 

•'w ^ 

/I 1 \ 

= lOw^ ^ , 0<w<l. 

\w yw ^ 

7.34. The inverse of this transformation is 

^2 = ^2-^l, 


and the absolute value of the Jacobian déterminant of this transforma- 
tion is 1. Therefore, the joint density function of the Y/s is 

g(yi,y2,---,yn) 

= e"^», 0 <yi <^2 < ••• <y„ < ”• 

The marginal density function of is 



7.37. Let P = ( /3 q, /3i, (32)' ^ The least-squares estimate of p is given by 

p = (X'X)“'x'y, 

where y = (y^, 3 ^ 2 ? • • • ? yn)'? X is a matrix of order nX3 whose first 
column is the column of ones, and whose second and third columns are 
given by the values of xf, / = 1, 2, . . . , n. 

7.38. Maximizing L(x, p) is équivalent to maximizing its natural logarithm. 
Let us therefore consider maximizing logL subject to £f=iA = l- 
Using the method of Lagrange multipliers, let 

I * \ 

F = log L + A E Fi “ 1 • 



CHAPTER 8 


613 


Differentiating F with respect to Pi,P 2 ^‘“^Pk^ equating the 
dérivatives to zéro, we obtain, + A = 0 for / = 1, 2, . . . , A. Combin- 
ing these équations with the constraint £f=iP, = l, we obtain the 
maximum likelihood estimâtes 




n 




CHAPTER 8 

8.1. Let X- = (Xji, ^ ^ 12 ^^ / = 0, 1, 2, . . . . Then 

/(x,. + fh,.) = 8(x,i + thi^Ÿ - 4(^;i + thi^){Xi 2 + thi 2 ) + 5(x,-2 + 

= + b^t + Cj-, 

where 


bi = - 4(x;i/!;2 +Xaha) + 10x;2/îi2> 

C; = 8xfi - 4 x;iX, 2 + 5x^2 . 


The value of t that minimizes /(x^ + ^h,) in the direction of is given 
by the solution of df/dt = 0, namely, = —bj{2a^. Hence, 


where 







/ = 0 , 1 , 2 ,.. 


? 



V/(x,) 

V /( X ,)||2 


i = 0,l,2,.. 


? 


and 


/ = 0 , 1 , 2 , 


W(x,) = (16x,i - 4 x,2, -4x;i + 10X;2)', 


» » » 



614 


SOLUTIONS TO SELECTED EXERCISES 


The results of the itérative minimization procedure are given in the 
following table: 


Itération i 

X,' 

h, 

ti 

üi 

/(x,) 

0 

(5, 2)' 

(-1,0)' 

4.5 

8 

180 

1 

(0.5, 2)' 

(0, - D' 

1.8 

5 

18 

2 

(0.5, 0.2)' 

(-1,0)' 

0.45 

8 

1.8 

3 

(0.05, 0.2)' 

(0, - D' 

0.18 

5 

0.18 

4 

(0.05, 0.02)' 

(-1,0)' 

0.045 

8 

0.018 

5 

(0.005, 0.02)' 

(0, - D' 

0.018 

5 

0.0018 

6 

(0.005, 0.002)' 

(-1,0)' 

0.0045 

8 

0.00018 


Note that > 0, which confirms that t^= —bj{2a^ does indeed 
minimize /(x^ + in the direction of h,. It is obvious that if we were 
to continue with this itérative procedure, the point would converge 
to 0. 

8.3. (a) 

y{x) = 45.690 + 4.919xi + 8.847x2 

— 0 . 270 X 1 X 2 — 4.148xi — 4.298x|. 

(b) The matrix B is 

[-4.148 -0.135' 

[-0.135 -4.298/ 

A 

Its eigenvalues are r^= —8.136, T 2 = —8.754. This makes B a 
négative definite matrix. The results of applying the method of 
ridge analysis inside the région R are given in the following table: 


A 

Xi 

X2 

r 

A 

y 

31.4126 

0.0687 

0.1236 

0.1414 

47.0340 

13.5236 

0.1373 

0.2473 

0.2829 

48.2030 

7.5549 

0.2059 

0.3709 

0.4242 

49.1969 

4.5694 

0.2745 

0.4946 

0.5657 

50.0158 

2.7797 

0.3430 

0.6184 

0.7072 

50.6595 

1.5873 

0.4114 

0.7421 

0.8485 

51.1282 

0.7359 

0.4797 

0.8659 

0.9899 

51.4218 

0.0967 

0.5480 

0.9898 

1.1314 

51.5404 

- 0.4009 

0.6163 

1.1137 

1.2729 

51.4839 

- 0.7982 

0.6844 

1.2376 

1.4142 

51.2523 

Note that the stationary point, Xq = (0.5601, 1.0117)', is a point of 

absolute maximum since 

the eigenvalues of B are 

négative. This 

point falls inside 

R. The corresponding 

maximum 

value of y is 


51.5431. 



CHAPTER 8 


615 


8 . 4 . We know that 


(B — À2l^)x2= 

where x^ and X2 correspond to and A2, respectively, and are such 
that x'^x^ = X2X2 = r^, with being the common value of ri and r|. 
The corresponding values of y are 


i^i = / 3 o + x'iP + x'iBxi, 

y2 = / 3 o + x'2p+X2BX2. 

We then hâve 

x'iBxi - x'2Bx 2 + i(x'i - x'2)P = ( Aj - A2)r^ 

9i -yi = l(x'i - x'2)P + ( Aj - A2)r2. 
Furthermore, from the équations defining x^ and X2, we hâve 

( A2 — Ai)XiX 2 = 2(^1 ~ ^2) P- 

Hence, 

fl-i^2=('^l-^2)('-^-x'iX2). 

But = ||xj|2 IIX2II2 > X1X2, since x^ and X2 are not parallel vectors. 
We conclude that >3^2 whenever A^ > A2. 

8 . 5 . If M(x^) is positive definite, then A^ must be smaller than ail the 
eigenvalues of B (this is based on applying the spectral décomposition 
theorem to B and using Theorem 2 . 3 . 12 ). Similarly, if M(x2) is indefi- 

A 

nite, then A2 must be larger than the smallest eigenvalue of B and also 
smaller than its largest eigenvalue. It follows that Al < A2. Hence, by 
Exercise 8 . 4 , <^2- 

8.6. (a) We know that 


(B-AI,)x= -ip. 


Differentiating with respect to A gives 



616 


SOLUTIONS TO SELECTED EXERCISES 


and since x'x = r^, 


âx 


dr 


X 


= r 


d\ d\ 

A second différentiation with respect to A yields 

dx 


(B -AI,) 


d^X 




= 2 


d\ ’ 


d^x 


dx' dx 


d^r 


X 


d\ 


2 


+ 


d\ dk 


= r 


dk 


+ 


dr \ 
dk 


If we premultiply the second équation by d^x' /dk^ and the fourth 
équation by dx' /dk, sub tract, and transpose, we obtain 

d^x dx' dx 

x ' — ^ -2 =0. 

dk^ dk dk 

Substituting this in the fifth équation, we get 


d^r 


dk 


= 3 


dx’ dx 


dk dk 


dr \ 
dk 


Now, since 


dr 


d 




1/2 


^A 


dx/dk 


= x 


(x'x) 


1/2 


we conclude that 


d^r 


dk 


2 


= 3r 


dx' dx 


dk dk 


- X 


dx ] 

dk 


2 


(b) The expression in (a) can be written as 


d^r 


dk 


= 2r 


2 


dx 

dx' 


l 

2 

+ 

r^ 

— — 

x' — 


dk 

^A 

dk 

l àxj 



The first part on the right-hand side is nonnegative and is zéro only 
when r = 0 or when dx/dk = 0, The second part is nonnegative by 



CHAPTER 8 


617 


the fact that 


/ ^ 

2 

dx 

(’■' ) 

< 

dX 


^ dx' âx 

. 

d\ 


Equality occurs only when x = 0, that is, r = 0, or when ^x/^A = 0. 
But when ^x/^A = 0, we hâve x = 0 if A is different from ail the 
eigenvalues of B, and thus r = 0. It follows that â r/dX > 0 except 
when r = 0 , where it takes the value zéro. 


8.7. (a) 

nCl r 7 

B=—j {£[y(x)] - 77(x)} dx 
a JR 

= ^{(7 - P)'rn (7 - P) - 2(7 - P)'ri2Ô + 8T228}, 

where ^22 are the région moments defined in Section 

8.4.3. 

(b) To minimize B we differentiate it with respect to 7 , equate the 
dérivative to zéro, and solve for 7 . We get 

2rii(7-p)-2ri28 = o, 

7 = P + f^2^ 

= Ct. 

This solution minimizes B, since F^^ is positive definite. 

(c) B is minimized if and only if 7 = Ct. This is équivalent to stating 

A 

that Ct is estimable, since E{\) = y. 

A 

(d) Writing X as a linear function of the vector y of observations of 

A 

the form X = Ly, we obtain 


E(X)=LE(y) 

= L(Xp + Z 8 ) 

= L[X:Z]t. 

But 7 =£'(X) = Ct. We conclude that C = L[X:Z]. 

(e) It is obvious from part (d) that the rows of C are spanned by the 
rows of [X : Z]. 



618 


SOLUTIONS TO SELECTED EXERCISES 


(f) The matrix L defined by L = (X^X) ^X' satisfies the équation 

L[X:Z] = C, 

since 

L[X:Z] = (X^X)“^X'[X:Z] 

= [l:(X'X)"^X'z] 

= [I:MiVMi2] 

= [i:riVri2] 

= c. 

8.8. If the région R is a sphere of radius 1, then 3g^ < 1. Now, 

'1 0 0 0 ' 

0 I 0 0 

Li= 0 0 I 0 ’ 

0 0 0 i 

'0 0 0 ' 

^ _ 0 0 0 
0 0 0 ’ 

0 0 0 

Hence, C = [I 4 : 04 >^ 3 ]. Furthermore, 

X=[l4:D], 



Hence, 

Mil = ?X'X 

'1 0 0 0 ■ 

0 g2 0 0 

^0 0 g2 0 ’ 

0 0 0 g2 



CHAPTER 8 


619 


Mi 2 = iX'Z 

0 0 0 ' 

0 0 

0 0 • 

0 0 

(a) 

Mii = rii => = j 

Mi 2 = Tii ,? = 0 

Thus, it is not possible to choose g so that D satisfies the condi- 
tions in (8.56). 

(b) Suppose that there exists a matrix L of order 4x4 such that 

C = L[X:Z]. 

Then 

I4 = LX, 

0 = LZ. 

The second équation implies that L is of rank 1, while the first 
équation implies that the rank of L is greater than or equal to the 
rank of I4, which is equal to 4. Therefore, it is not possible to find a 
matrix such as L. Hence, g cannot be chosen so that D satisfies the 
minimum bias property described in part (e) of Exercise 8.7. 

8.9. (a) Since A is symmetric, it can be written as A = PA P', where A is a 
diagonal matrix of eigenvalues of A and the columns of P are the 
corresponding orthogonal eigenvectors of A, each of length equal 
to 1 (see Theorem 2.3.10). It is easy to see that over the région i//, 

/*(ô,D)<8'ôe_(A)<r2e_(A), 

by the fact that Cj^^^^CA)! — PA P' is positive semidefinite. Without 
loss of generality, we consider that the diagonal éléments of A are 
written in descending order. The upper bound in the above in- 
equality is attained by /z(ô,D) for ô = rP^ where P^ is the first 
column of P, which corresponds to c^^^(A). 

(b) The design D can be chosen so that is minimized over the 

région R. 

8.10. This is similar to Exercise 8.9. Write T as PA P', where A is a diagonal 
matrix of eigenvalues of T, and P is an orthogonal matrix of corre- 
sponding eigenvectors. Then 8'Tô = u'u, where u = A^/^P'8, and 



620 


SOLUTIONS TO SELECTED EXERCISES 


A(8, D) = 8'Sô = u'A P'SPA u. Hence, over the région <î>, 

8'S8 > Ke^^i A“^/2 P'SPA-^/2 ) . 

But, if S is positive definite, then by Theorem 2.3.9, 


e„i„(A-i/2p'SPA-i/2) =e^.„(PA-ip' S) 


= e„in(T-'S). 


8.11. (a) Using formula (8.52), it can be shown that (see Khuri and Cornell, 
1996, page 229) 


V = 


1 


2) — X\k 


X^(^k "I" 2) — 2.k A 2 — 


A4^ 

Z 


+ 2Æ 


A4(Æ + 1) — X\{k— 1) 
A4( A: + 4) 


where k is the number of input variables (that is, k = 2), 


1 ^ 

h=-llxli, i = l,2 

« « = i 

= ~{2'^ + 2a^) 
n 

8 

? 

n 






n 


4 


n 


? 


and 7î = 2^ + 2Æ + tîq = 8 + /ÎQ is the total number of observations. 
Here, dénotés the design setting of variable x^, i= 1,2; u = 
12 n 

-i- ^ » » » ^ / » 

(b) The quantity V, being a function of Hq, can be minimized with 
respect to Hq. 



CHAPTER 8 


621 


8 . 12 . We hâve that 

(Y - XB)'(Y - XB) = (Y - XB + XB - XB) (Y - XB + XB - XB) 

= (Y-XB)'(Y-XB) + (XB-XB/(XB-XB), 

since 

(Y-XB/(XB-XB) = (X'Y-X'XBy(B-B) =0. 

A A 

Furthermore, since (XB — XB/(XB — XB) is positive semidefinite, then 
by Theorem 2.3.19, 

e,.[(Y-XB)'(Y-XB)] > e,[(Y - XB)'(Y - XB)] , i=l,2,...,r, 

where c^(*) dénotés the ith eigenvalue of a square matrix. If (Y — 
XB)'(Y - XB) and (Y - xèy(Y - XB) are nonsingular, then by multiply- 
ing the eigenvalues on both sides of the inequality, we obtain 

det[(Y-XB)'(Y-XB)] > det[(Y - XB)'(Y - XB)] . 

A 

Equality holds when B = B. 

8 . 13 . For any b, 1 b <e^. Let a = 1 b. Then a < e^~^. Now, let A, dénoté 
the ith eigenvalue of A. Then A, > 0, and by the previous inequality, 

P l P \ 

r[Ai<exp EA, -p . 

^■ = 1 \/=l / 

Hence, 

det(A) < exp[tr(A - I^)] . 


8 . 14 . The likelihood function is proportional to 


[det(V)] '*'^^exp - 


n 

2 



Now, by Exercise 8.13, 

det(SV“i) < exp[tr(SV"i - 1^)] , 



622 


SOLUTIONS TO SELECTED EXERCISES 


since det(SV^) = det(V^/^SV“^/^X tr(SV“0 = tr(V“^/^SV^/2), and 
y-i/ 2 sY-i /2 jg positive semidefinite. Hence, 

tr(ip) • 

This results in the following inequality: 


which is the desired resuit. 

8.16. y(x) = r(x)p, where p = (X'X)“^X'y, and T(x) is as in model (8.47). 
Simultaneous (1 — u) X 100% confidence intervals on f'(x)p for ail x in 
R are of the form 

f ' (x) P + ( ;, [f ' (x) (X' X) - 'f(x) ] . 

For the points x^,X 2 , . . . ,x^, the joint confidence coefficient is at least 
1 — a. 


exp tr(SV < [det(S)] 


■ ^ ^ 

[det(SV“^)]"'^^exp tr(SV^) < exp 

2 2 


CHAPTER 9 

9.1. If f(x) has a continuons dérivative on [0, 1], then by Theorems 3.4.5 
and 4.2.2 we can find a positive constant A such that 

l/(^l) -f{X2)\<A\x^ -%2 

for ail Xi, X 2 in [0, 1]. Thus, by Définition 9.1.2, 

Ô) <AÔ, 

Using now Theorem 9.1.3, we obtain 

\f(x) -b„{x)\< 

fl 

for ail X in [0, 1]. Hence, 

c 

sup \f{x) -b„{x)\< 

0<x<l ^ 


where c = 2 ^ . 



CHAPTER 9 


623 


9.2. We hâve that 


\f(Xi) -f(X2)\< sup |/(^i) -/(^z) 


for ail \xi — %2 

follows that 


< Ô9, and hence for ail 


x^ —X 


< 8^, since < Ô2- It 


sup |/(Xi) -/(Xz)! < sup |/(Zi)-/(Z2) 


A "!— ^^2 I < 


Z ^ Z 2 ^ 8 2 


that is, <^(5^) < <w(Ô2). 

9.3. Suppose that f(x) is uniformly continuons on [a, b]. Then, for a given 
6>0, there exists a positive ^(é) such that \f{x^) ~/(^2) I ^ ^ 

Xp X2 in [a, b] for which \x^ —X2 \ < 8. This implies that w(ô) < e and 
hence w(ô) ^ 0 as 5^0. Vice versa, if <w(ô) ^ 0 as ô ^ 0, then for a 
given 6>0, there exists > 0 such that w(ô)<e if This 

implies that 


/(Xi) -/(X2)| <e 


if 


IXj — X2 


< ô < 


and f(x) must therefore be uniformly continuons on [a, b], 

9.4. By Theorem 9.1.1, there exists a sequence of polynomials, namely the 

Bernstein polynomials that converges to /(x)=|x| uni- 

formly on [-a, a]. Let Pn(x) = b^(x) — b JS)). Then pJO) = 0, and pjx) 
converges uniformly to |x| on [—a, a], since ^„(0) ^ 0 as n ^ 

9.5. The stated condition implies that fQf(x)pJx) dx = 0îor any polynomial 
pJx) of degree n. In particular, if we choose pJx) to be a Bernstein 
polynomial for /(x), then it will converge uniformly to /(x) on [0, 1]. By 
Theorem 6.6.1, 



= 0 . 

Since /(x) is continuons, it must be zéro everywhere on [0, 1]. If not, 
then by Theorem 3.4.3, there exists a neighborhood of a point in [0, 1] 
[at which /(x) # 0] on which /(x) ^ 0. This causes fo[f(x)]^dx to be 
positive, a contradiction. 



624 


SOLUTIONS TO SELECTED EXERCISES 


9.6. Using formula (9.15), we hâve 


f{x) -p{x) = 


1 


n 


(n + 1)! ,-=o 


But |n”=o(^ “ ^i) I ^ (see Prenter, 1975, page 37). Hence, 


T 

sup |/(x) -p(x)\< "7 

a<x<b 4(|2 + 1 ) 


9.7. Using formula (9.14) with «g, « 2 , and « 3 , we obtain 

p(x) =/q(x) log «0 +/^(x)log +/ 2 (^)log ^2 +^ 3 (^)log 


where 


/o(x) = 




\ ^0 / 


X — a 


2 


ÜQ Ü 2 j 


' X — ^ 


\ ^0 ^3 / 


A(X) = 


X — fl 


0 


fll Aq ! 


' X — Ü2 ^ 


\ ^2 / 


X — fl 


fll fl3 J 


/^{x) = 


X — a 


0 


Ü2 CIq / 


X — a 


\Ü2 0.1 J 


X — a 


\Ü2 0^1 


/^(x) = 


X — a 


0 


fl 3 flg / 


X — a 


Ü2 fll / 


' X — Ü2 ^ 


\ ^3 ^2 / 


Values of /(x) = log x and the corresponding values of p{x) at several 
points inside the interval [3.50,3.80] are given below: 


X 

f(x) 

p(x) 

3.50 

1.25276297 

1.25276297 

3.52 

1.25846099 

1.25846087 

3.56 

1.26976054 

1.26976043 

3.60 

1.28093385 

1.28093385 

3.62 

1.28647403 

1.28647407 

3.66 

1.29746315 

1.29746322 

3.70 

1.30833282 

1.30833282 

3.72 

1.31372367 

1.31372361 

3.77 

1.327075 

1.32707487 

3.80 

1.33500107 

1.33500107 



CHAPTER 9 


625 


Using the resuit of Exercise 9.6, an upper bound on the error of 
approximation is given by 


sup \f{x) -p(x)\< 

3.5<x<3 .s 



4 



where h = max-(«.^^ — a-) = 0.10, and 


T4= sup |/(^)(X) 
3.5<x<3 .8 


= sup 

3.5<x<3 .8 




Hence, the desired upper bound is 


tX 6 (0.10 y 

~Ï6~ ~ Ï6 

= 2.5 X 10“^ 


9.8. We hâve that 


-s"{x)Ÿ dx = 


f [f"{x)Ÿdx— f [s"{x)Ÿ dx 

"'a "'a 



(x)[/"(x) — ^'(x)] dx. 


But intégration by parts yields 



(x) [f'(x) — ^"(x)] dx 


=s"{x)[f{x) -^'(x)]|* - /‘V(x)[/'(x) -x'(x)] dx. 

"'a 

The first term on the right-hand side is zéro, since f{x) = at 
x = a,b\ and the second term is also zéro, by the fact that ^"(3:) is a 
constant, say ^"'(x) = Ap over + Hence, 

n — l 

s"'{x)[f{x) -s'{x)]dx= Xi 

i = 0 


/ ^i + l 

[/'(x) — ^'(x)] dx = Q. 




626 


SOLUTIONS TO SELECTED EXERCISES 


It follows that 


l^[f"{x) -s"{x)Ÿ dx = 


dx- f''[s"{x)Ÿ dx, 

''a '^a 


which implies the desired resuit. 


9 . 10 . 


Hence, 


0) 


dX 


r+l 


= (-l) 0102 


r+l ^-e^x 


sup 

0<x<8 


â'"^^h(x, 0 ) 


<50. 


Using inequality (9.36), the integer r is determined such that 


2 

(7TT)! 


-J (50) <0.05. 


The smallest integer that satisfies this inequality is r = 10. The Cheby- 
shev points corresponding to this value of r are where by 

formula (9.18), 



1 2i^l\ 


Z, = 4 + 4 cos 

[ 22 

77 




Using formula (9.37), the Lagrange interpolating polynomial that ap- 
proximates h{x, 0) over [0, 8] is given by 

10 

Pio(x, 0 ) = 01 E [l - 

z = 0 


where f^x) is a polynomial of degree 10 which can be obtained from 
(9.13) by substituting z^ for a, (/ = 0, 1, . . . , 10). 


9 . 11 . We hâve that 


Hence, 


d\{x, a, [3) 
dx‘^ 


= (0.49 - 


max 


10<a:<40 


â\(x, U, /3) 

âX^ 



It can be verified that the function f{ (3) = is strictly monotone 

increasing for 0.06 < /3 < 0.16. Therefore, /3 = 0.16 maximizes fi (3) 



CHAPTER 10 


627 


over this interval. Hence, 

<(0.49-a)(0.16)V“°-^2 

<0.13(0.16)V-°-^2 

= 0.0000619, 

since 0.36 < a< 0.41. Using Theorem 9.3.1, we hâve 

max a, 15) — s{x)\ < (0.0000619) . 

10<x<40 

Here, we considered equally spaced partition points with Ar, = A. Let 
us now choose A such that 


max 

10<x<40 


d\{x, a, (3) 


âx‘ 


3|4A^(0.0000619) <0.001. 

This is satisfied if A < 5.93. Choosing A = 5, the number of knots 
needed is 


40-10 

m = 1 

A 

= 5. 


9.12. 


det(X'X) = [(X 3 — a)^(x 2 “^ 1 ) “ (^3 ~^i )(^2 “ 

The déterminant is maximized when x^= — 1, X 2 = |(1 + x^ = 1. 


CHAPTER 10 


10.1. We hâve that 


1 r7T 1 ^7T 

— / COS nx cos mxdx = — / [cos (n + m)x + cos (n —m)x] dx 

JY J _ — 1 1T J — 


2tt 

0 , n 

1 , n = m > 1, 


1 r7T 

— COS nxsin mxdx 

'TT J _ 


1 


/ TT 

[sin (tî m) X — sin{n — m) x\ dx 


1 ^TT 

— sin 7 ZX sin mxdx 

TT J 


0 for ail m,n, 
1 


2tt J - 


/ TT 

[cos(/r — m)x — cos(tî -\-m)x]dx 

- TT 


_ / 0 , ni^m, 

1 , n = m>\. 



628 


SOLUTIONS TO SELECTED EXERCISES 


10.2. (a) 


(i) Integrating n times by parts [x^ is differentiated and /?„(x) is 
integrated], we obtain 

( x'^Pn{x) dx = {^ for m = 0, 1, . . . , 7î — 1. 

•^-1 

(ii) Integrating n times by parts results in 

r x"p„{x) dx = — C (1 -x'^Ÿ dx. 

J-i Z J 

Letting x = cos 0, we obtain 


( (1 —x^) dx= f ^ 
J-i Jq 


dO 




2n + 1 \ ^ ) 


n>0. 


(b) This is obvions, since (a) is true for m = 0, 1, . . . , n — 1. 

(c) 

/ I ^ 1 ^ f 

Pn(x)dx= / 

-1 •'-1 


2n 


2^ \ n j 


x" + 77„_i(x) 


Pn{x) dx. 


where 7t^_-^{x) dénotés a polynomial of degree n — 1. Hence, 
using the results in (a) and (b), we obtain 


ri 1 

j pl{x)dx= — 


1 

2/7 ^ 

2« + i 

27î 

2" 

. n , 

2/î + 1 



-1 


10.3. (a) 


2n + r 


n > 0. 


r„( ) = cos{ 7Î Arccos 


cos 


(2/- 1)7T 

2n 


(2/ -1)77 

= COS = ü 


for i = 

(b) r;(x) = 


12 n 

^ (n / \/l — ) sin(n Arccos x). Hence, 

(27-1)77 


n 


T’{Q = 




sin 


^0 


for i = 1, 2, 


, n. 


» » » 



CHAPTER 10 


629 


10.4. (a) 


’ = ( - n xe^ /2 1 + ( - n /2 1 




Using formulas (10.21) and (10.24), we hâve 


dH„{x) 

dx 


= ( - l)”x6"'/"( - l)”e-"'/2//„(x) 


n _^2 


xH„{x) -//„+i(x) 

nH^_^(x), by formula (10.25). 


(b) From (10.23) and (10.24), we hâve 


//„ + i(x) =xH^{x) - 


dH^{x) 


Hence, 


dx 

d^H„{x) 


dH,{x) d^H„{x) 

dH„{x) dH„^^{x) 

dHn(x) 

— +^«(^) - in+i)H„{x), 

dH,{x) 


by (a) 


which gives the desired resuit. 


10.5. (a) Using formula (10.18), we can show that 


sin[(7î + 1)^] 


sin 0 


<n 1 


by mathematical induction. Obviously, the inequality is true for 
7î = 1, since 


sin 2 ^ 2 sin 0 cos 0 


sin^ 


sin 0 


<2. 



630 


SOLUTIONS TO SELECTED EXERCISES 


Suppose now that the inequality is true for n=m.To show that it 
is true for n=m-\- l{m > 1): 


sin[(m + 2) 


sin[(m + 1) ^]cos 0 + cos[(m + 1) ^]sin ^ 

sin 6 


sin 0 


<m + l + l= m + 2. 

Therefore, the inequality is true for ail n. 
(b) From Section 10.4.2 we hâve that 


dT^{x) sin nO 

; = ^ ^ 

dx sin 6 


Hence, 


dT„jx) 

dx 


sin nO 


= n 


sin 0 


< rd, 


since 


sinnO/sin 0\<n, which can be proved by induction as in 


(a). Note that as x ^ + 1, that is, as ^ ^ 0 or ^ ^ tt, 


dT„{x) 

dx 




(c) Making the change of variable t = cos 6, we get 


^n(0 , f 


Arccos X 


COS nOdO 


1 

— sin 7î ^ 
n 


Arccos X 


TT 


1 

— sin (tî Arccos x) 
n 

1 

— sin 7î i// , where x = cos if/ 
n 

1 

— sin ï// by (10.18) 

n 


1 


= --Vl -x^U„_^{x). 


n 



CHAPTER 10 


631 


10 . 7 . The first two Laguerre polynomials are Lq(x) = 1 and L^{x)=x — 
û: — 1, as can be seen from applying the Rodrigues formula in Section 
10.6. Now, differentiating H{x, t) with respect to we obtain 

dH{x, t) 
dt 


a-\- 1 
l-t 


X 


( 1-0 




or equivalently, 

2 âH(x, t) 
dt 

Hence, = 7/(x, 0) = 1, and gi(x) = — dH{x, t) / dt\t=Q = x — 

a—1. Thus, gQ(x)=LQ(x) and g^(x)=L^(x). Furthermore, if the 
représentation 

CO / 

is substituted in the above équation, and if the coefficient of t” in the 
resulting sériés is equated to zéro, we obtain 

gn + i{x) + {2n-x+ a+l)g„{x) + (n^ + na) g„_^{x) = 0. 

This is the same relation connecting the Laguerre polynomials 
+ L„(x), and L„_^(x), which is given at the end of Section 

10.6. Since we hâve already established that g„(x) =L„(x) for n = 0, 1, 
we conclude that the same relation holds for ail values of n. 



( 1-0 


(x — CK — l) + (o; + l)t]H{x, t) = 0. 


10 . 8 . Using formula (10.40), we hâve 


/ 3x^ 1 ' 

/?4^(X) =Co + CiX + C2^— - - 

I35x^ 


where 




30x^ 3 

+ — 

8 8 


? 



632 


SOLUTIONS TO SELECTED EXERCISES 



In computing Cq, c^, C 2 , C3, C4 we hâve made use of the fact that 

ri 2 

/ pl{x)dx =- — — , n = 0 ,l, 2 ,..., 

J -1 In -r i 

where p^ix) is the Legendre polynomial of degree n (see Section 

10 . 2 . 1 ). 

10 . 9 . 

f(x) == y + + CiTiix) + c^T^{x) + cj^{x) 

= 1.266066 + 1.130318ri(x) + 0.271495r2(x) 

+ 0.04433Vr3(x) + 0.005474 r4(x) 

= 1.000044 + 0.99731X + 0.4992x^ + 0.177344x^ + 0.043792x^ 

[Note: For more computational details, see Example 7.2 in Ralston 
and Rabinowitz (1978).] 

10 . 10 . Use formula (10.48) with the given values of the central moments. 

10 . 11 . The first six cumulants of the standardized chi-squared distribution 
with five degrees of freedom are = 0, /C2 = 1, K3 = 1.264911064, 
/C4 = 2.40, K5 = 6.0715731, Kg = 19.2. We also hâve that Zqqs = 1.645. 
Applying the Cornish-Fisher approximation for X005? we obtain the 
value Xqo 5 1.921. Thus, Pixs^ > 1.921) == 0.05, where dénotés 
the standardized chi-squared variate with five degrees of freedom. If 
xi dénotés the nonstandardized chi-squared counterpart (with five 
degrees of freedom), then Xs = xt^ + 5. Hence, the correspond- 
ing approximate value of the upper 0.05 quantile of Xs is /ÏÔ^(1.921) 
+ 5 = 11.0747. The actual table value is 11.07. 


10.12. (a) 


•'0 


1 - 


1 


1 


+ 


2X3X1! 22x5x2! 


1 

+ ^ = 0.85564649. 

2^X9 X4! 


1 

2^X7X3! 



CHAPTER 10 


633 


/•^ ^ 'J w/V X •A' 

f e~‘ /^dt^xe-^ ^ &o - + -®2 - 
Jn ° 7. ^ 2 7. 




where 


®« 2 


1 { X 

17^4 2 


®^2 


1 / X 


- H. - 


2 ! 2 


1 { X \^ \ I X 
— - - -1 , 

2!l2/ 12/ 


®* 2 


1 /x\4r/x\^ 


4! \2 


2 -‘U 


0.f- 


— - - -15 - +45 - -15 

6!l2i \2 \2 \2 


Hence, 


0.85562427. 

Jn 


10 . 13 . Using formula (10.21), we hâve that 


g{x)= LKi-i) Hx) 

n = 0 ^ 


= L (-i)X 


« = 0 


1 d"{e-^'/^) 

V2t7 dx" 


by (10.44) 


n = 0 


c„ d'‘d){x) 
^ dx" ’ 


where c„ = ( — l)"n!h„, n = 0, 1, 


» » » 



634 


SOLUTIONS TO SELECTED EXERCISES 


10.14. The moment generating function of any one of the \ is 


2 

^{t) = \ f(x) dx 

— 00 


=r- 

'' —00 


.tx‘ 


A3 d^(j){x) A4 d^<l){x) A3 d%{x) 
~ ~6 dx^ ^ M dx* ^ 72 dx^ 


dx. 


By formula (10.21), 


d^(fy{x) 

<^,(x)//„(x), /r = 0,l,2,... 


Hence, 


— 00 


.tx‘ 


(j){x) + y (/>(x)// 3 (x) + ^(l){x)H^{x) 


2 


=T 

Jn 


00 


21 e‘^' 
0 


+ ^<l>(x)H,(x) 


A A^ 

Hx) + -^cl){x)H^{x) + ^4>{x)Hç,{x) 


dx 


24 


72 




since H^{x) is an odd function, and H^{x) and H^f^x) are even 
functions. It is known that 


f ^-x dx = 2^-^T{n), 

•'O 


where r(/r) is the gamma function 


r(7î) = f e ^ dx = 2 f e ^ dx, 

Ja Jq 


0 


7î > 0. It is easy to show that 


/•CO , 1 /-CO O 

f cf>(x)x^ dx= f 
Jq ^ ’ ^/2^ Jq 


m 

2 ^ -i(m + i) Im + l 

( 1 - 2 ?) + 


2'fÏT 



CHAPTER 11 


635 


where m is an even integer. Hence, 
2 f 4>{x) dx = 

•'O y/TT 


^co ■> 

2 / (f){x)H^{x) dx = 2 I c^(x)(x'^ — 6x^ + 3) dx 

h -^0 

= ^(i-20''/"r(|) 

V 77 

-2^(l-20"^''^r(|) +3(1 -2^)“^''^ 

V 77 


^00 /•‘^ 9 

2 / (/>(x)// 5 (x) dx = 2 j c^(x)(x^ — 15x"^ + 45x^ — 15) 

•'O •'O 


8 _ , 60 
(i-20'’/"r(i)- -(1-20 


-5/2 


r( 


90 


+ ^(1-2?) ^/^r(|)- 15(1-20 

V 77 


- 1/2 


The last three intégrais can be substituted in the formula for i//(0-The 
moment generating function of W is [i//(0]”- On the other hand, the 
moment generating function of a chi-squared distribution with n 
degrees of freedom is (1 — 20~”^^ [see Example 6.9.8 concerning the 
moment generating function of a gamma distribution G(a, /3), of 
which the chi-squared distribution is a spécial case with a = n/a, 
/3 = 2 ]. 


CHAPTER 11 

11.1. (a) 


ÜQ 


1 /-TT 


dx 


TT J - 


2 ç7T 

/ Xdx = 77, 

J Ci 


77 •'O 


a^= — 


n 


1 .77 

J 


TT J - 


COS nxdx 


2 .77 

= — xcos nxdx 
77-^0 


ts>|Ui 



636 


SOLUTIONS TO SELECTED EXERCISES 



2 ^7T 

/ sin nxdx 

riTT Jq 


TTW 




1 


K = -S 

TT J _ 


r 

X 

^ — TT 



sin nxdx 


TT 


X 


= 0 , 

4 / cos3x cosSx 

— cos X ^ ^ ^ ^ h 

TT 3^ 5^ 


1 


a 


0 




r 

sin X 

— TT 



dx 


2 .TT 4 

= — / sinxA:= — , 

TT 'o 'TT 


1 

a^= — 


/' 

sin X 

^ — TT 



cos nxdx 


2 çTT 

— sin X cos nxdx 


1 ^77 

/ [sin(/î + l)x — sin{n — l)x] dx 

Jrt 


TT-Tq 


(-l)" + l 


Sin X 


jL 

tt{ 

n^-l) 

O 

II 

1 , 

^TT 

sin X si 

= - 


TT J 

— TT 


O 

II 

2 

4 

' cos2x 

TT 

TT 

3 


n ^ 1, 


sin nxdx 


+ + + 


15 


35 



CHAPTER 11 


637 


(c) 


1 


a 


0 


= — / (x + x^) dx 

TT — TJ- 


2tt 


2 


= 


« 


3 ’ 

1 r'^ 

— (x +x^)cos nxdx 

TT J — 


2 ^77 

— x^œsnxdx 

ttJq 


= ^COS 7177= ( — 1) 


rt 


n 


n 


2 ’ 


Z? = 


« 


1 r'^ 

— / (x +x^)sin TTXife 

IT J — 


2 ^77 

— xsin nxdx 
77-^0 


1 

(-D--, 


X +X^ = 


77^ / 1 

h 4 — COS X H — sin X 

3 2 


1 1 

— 4| 7COs2xH — sin2x 

2^ 4 


+ 


for — 77 < X < 77. When x = + 77 , the sum of the sériés is |[( — 77 + 
77 ^) + (77 + 77 ^)] = 77 ^, that is, 77 ^ = 77^/3 + 4(1 + 1/2^ + 1/3^ 


+ •••). 

11.2. From Example 11.2.2, we hâve 


CO 


77“ ( “ 1) 

n = l 


n 


COS nx. 


Putting X = + 77, we get 


77“ J21 1 


. 2 00 

= E 


n = l 


n 


2 • 


6 



638 


SOLUTIONS TO SELECTED EXERCISES 


Using X = 0, we obtain 



(- 1 )" 



? 


or equivalently, 


77 


2 



CO 


= E 

n = l 


(- 1 ) 


« + 1 


n 


Adding the two sériés corresponding to tt^/ô and 77^/12, we get 


that is, 


377 ^^^ 1 

lE ^ (2„ - 1)^ ’ 

— = y 1 

8 h (2n - 1)^ ■ 


11 . 3 . By Theorem 11.3.1, we hâve 

CO 

f'{x) = (nb^ cos wc — na^sinnx). 

n = l 

Furthermore, inequality (11.28) indicates that = n^al) is a 

convergent sériés. It follows that nb^ 0 and na^ ^ 0 as n ^ ^ (see 
Resuit 5.2.1 in Section 5.2). 

11 . 4 . For m> n,wc hâve 




m 


^ (üf^cos kx bf^sin kx) 


1 

k = n + l 




1 


m 





< 

E 

(4 + H] 



k = n + \ 






m 

1 




— 

E 


2 + 
k ^ 

[see (Il 


k = n-\-\ 





1 

1 

1/2 

m 

- 

< 1 

E 



E 1 

(«k + /3k) 


, k = n+ : 

l ^ 1 

1 

k = n + l 



lV2 



CHAPTER 11 


639 


But, by inequality (11.28), 


and 


Thus, 


where 


m 


E («* + /3*)^-/ [f'{x)Ÿdx 

k=n + l 


m 


1 


00 


i: i: 


1 


k=n+\ ^ k=n+l ^ 


< 


r 

n 


dx 1 


X" n 


^m(x) -S„(X)| < 


c^= — f [f'{x)Ÿdx. 

■77 J 


By Theorem 11.3. l(b), ^^(x) -^f(x) on [ — 77, tt] as m ^ 00 . Thus, by 
letting m ^ 00 , we get 


|/(x) -^„(x)| < 


11.5. (a) From the proof of Theorem 11.3.2, we hâve 


00 


Aq a^TT 


E — COS7î7T= — 

n 2 2 


n = l 


This implies that the sériés E“=i (—iTb^/n is convergent, 
(b) From the proof of Theorem 11.3.2, we hâve 


^ J aoi^ + x) ” 
j f{t)dt= + E 


— TT 


n = l L 


«« . K 


sin nx (cos nx — cos mr) 

n n 


Putting X = 0, we get 


rO ^Ç) 7 T ^ b„ . 

/ f(t)dt=^- E ^ i-(-i)” 
J-TT 2 n ^ 


This implies convergence of the sériés E“=i (^„//r)[l — ( — 1)”]. 
Since E“ = i ( — l)”^„/n is convergent by part (a), then so is 

K=iK/^‘ 



640 


SOLUTIONS TO SELECTED EXERCISES 


11.6. Using the hint and part (b) of Exercise 11.5, the sériés 

would be convergent, where h^ = l/\ogn. However, the sériés 
^/{n\ogn) is divergent by Maclaurin’s intégral test (see Theo- 
rem 6.5.4). 


11.7. (a) From Example 11.2.1, we hâve 


n + \ 


^ ” (- 1 ) 

— = smnx 

2 „ = i n 


for — TT <x < 7T. Hence, 


X 


t 

/ — dt 
Jo 2 


” (-1)""' “ (-1) 

= - L 2 cosra:+ 2^ J 

n „_i n 


n + \ 


n = l 


n = \ 


Note that 


00 


E 

n = \ 


(- 1 ) 


n + \ 


rv 


1 .TT x^ 

/ — dx 

2ttJ-^ 4 


Hence, 


TT 


12 


X" TT 


CO 


12 


- E 


(- 1 ) 


fî + 1 


n = l 


n 


cosm:, — 7r<x<7T. 


(b) This follows directly from part (a). 

11.8. Using the resuit in Exercise 11.7, we obtain 


xl TT t 


/:( 


2 \ 


12 


00 


dt= Y, 




n = l 


-^0 


f cos ntdt 
J{\ 


n + \ 


^ (- 1 ) 

= 2^ 3 smra:. 


— TT <X < TT 


n = l 


rv 


Hence, 


CO 


E 

n = l 


(- 1 ) 


n + 1 


TT X 


sm nx = — X — 


rv 


«'V « 

12 12 


— 77 <X < 77. 



CHAPTER 11 


641 


11.9. 






dx, 


F(w) = — / (-2xe~^ )e-^^^dx 

(exchanging the order of intégration and differentation is permissible 
here by an extension of Theorem 7.10.1). Integrating by parts, we 
obtain 


F{w) = 


47T 



00 

/.CO 


— 00 — 




— 00 


— i 

j e-^\-iw)e-^^^dx 


CO 


e-^ e~‘'^^dx 


00 


4t7 J - 

W 1 ç 

w 

= -^F(w). 


The general solution of this differential équation is 

F{w) =Ae~^'/\ 

where is a constant. Putting w = 0, we obtain 

1 -CO 

27 T *^-00 
1 


A=F{0) = — f e~^"dx 


2 tt 2 ^[Ïï 

Hence, F(w) = 

11.10. Let H{w) be the Fourier transform of (/* g)(x). Then 

dx 


= / f{x-y)g{y)dy 

Ait — 00 *^—00 


— iwx 


^ r g(y)\r f(x-y)e-‘'^’^dx 
^ — 00 


2 tt J - 


dy 


/ CO \ .oc 

- 00 ^ TT — 00 

= F(w)f g(y)e~‘'^ydy 
^ — 00 

= 27tF( w)G( w) . 


g{y)e-^'^ydy 



642 


SOLUTIONS TO SELECTED EXERCISES 


11.11. Apply the results in Exercises 11.9 and 11.10 to find the Fourier 
transform of f(x); then apply the inverse Fourier transform theorem 
(Theorem 11.6.3) to find /(x). 

11.12. (a) 


n 


Sn(Xn) = E 


2(-l) 


k + 1 


k = l 


k 



/ TT \ 


sin 

k\7T 



. \ n } 



k + 1 


E 1 (-1) 


k = l 


n 


" 2sin(Æ7r/7î) 

Zw 


k = l 


k 



n 


K{Xn)=2 E 
k = l 


sm(k7T/n) TT 
kir/n n 


By dividing the interval [0, tt] into n subintervals [(k — 
l)7T/n, kir/n], 1 <k <n, each of length tt/tî, it is easy to see that 
is a Riemann sum S(P, g) for the function g(x) = 
(2 sin x)/x. Here, g(x) is evaluated at the right-hand end point of 
each subinterval of the partition P (see Section 6.1). Thus, 



11.13. (a) y = 8.512 + 3.198 cos (/> — 0.922 sin cf) + 1.903 cos2</> + 3.017sin2(/). 
(b) Estimâtes of the locations of minimum and maximum résistance 
are (/> = 0.79447r and <^ = 0.11537 t, respectively. [Note: For more 
details, see Kupper (1972), Section 5.] 


11.15. (a) 




'LYj-nfj. 

w'=l 




CHAPTER 11 


643 


where Uj = Yj — /jl, j = 1,2, ... ,n. Note that E(Uj) 
= a^. The characteristic function of s'^ is 

n 


Now, 


7 = 1 


Hence, 


Let 


=E 


r 

l+itU: t/,.2 + 

^ 2 ^ 


= 1 - 




+ o{t^). 


2 \ 


_ t 

E(e'tUi/‘^P) = 1 +0 — 

^ ’ 2n \ n 


7>„(0 = 


2 \ 


1 n 


r / 1 

1 +0 — 

2n \ n 


2 \ 




r / 1 

ï~ 

2n \ n 


2 r 


n 




■>.(<)- (1 +»„)'■'■■ 


- 1/2 + 0 ( 1 ) 


- 1/2 




As 7î ^ 00 ^ ^ 0 and 


1 + ^n) 


t^/<On 


- 1/2 


-t^ /2 


[(1 + ^n) 


t /û)„ 


1 o(l) 


1 . 


0 and Var(L/ ) 


\ 


1 o(l) 



644 


SOLUTIONS TO SELECTED EXERCISES 


Thus, 



as n 


00 

» 


(b) 


,-?V2 


is the characteristic function of the standard normal distri- 


bution Z. Hence, by Theorem 11.7.3, as n ^ oo, Z. 


CHAPTER 12 


12.1. (a) 


I 


n n 


h l 

log 1 + log + 2 ^ log Xi 

i = l 




n — 1 

log /r + 2 X! log ^ 

i = 2 


n-1 ^ 

~ 2{n-l) 

= f log n + log 2 + log 3 + • * • + log( 7î — 1) 

= flog/r +log(/î!) —\ogn 
= log(/î!) — ^log 7Î. 

(b) n log log(/r!) — f log n. Hence, 

log(/r!) == (n + ^)log/î — 7î + 1 

n\ - exp[(7î + §)log/r — n + l] 

12.2. Applying formula (12.10), we obtain the following approximate values 
for the intégral j^dx/il x)\ 


n Approximation 

2 0.69325395 

4 0.69315450 

8 0.69314759 

16 0.69314708 


The exact value of the intégral is log2 = 0.69314718. Using formula 
(12.12), the error of approximation is less than or equal to 

M^{b-af M4 
2880n^ 2880(8)^^ ’ 



CHAPTER 12 


645 


where M 4 is an upper bound on | for 0 <x < 1. Here, f(x) 

1/(1 +x), and 


/(4)(;,) = 

(1+x)^ 


Hence, Ma = 24, and 


2880(8)' 


0.000002. 


2 = 


2^- 7t/2 
V2 ’ 






-1 < 2 < 1 . 


Then 


■77/2 

sin 


TT ^1 77 

0d0= — \ sin —( 1 + 2 ) dz 
A J-i [4 

77 ^ r 77 

= 7 L ^/sin -(1+^/) 

^ 7- = n L ^ 


Using the information in Example 12.4.1, we hâve Zq = 

/s 5 8 5 

•^2 y ^ ? ^0 9^^ ^2. 9^* H^rice, 


1/5 ’ ^1 


rir/l il j ^ ni j ^ U n 

/ sin 0dO^ — { —sin — 1 — 1 / — H — sin — 

Jn d \ Q d\ VS Q4 


77 I 5 


4 9 


5 77/ /3 

H — sin — 1 + 1 / — 

9 4 ' V 5 


= 1.0000081. 


The error of approximation associated with sin[( 77 / 4 )(l -\-z)]dz is 
obtained by applying formula (12.16) with a = —1, b = l, f(z) = 
sin[( 77 / 4 )(l + z)], n = 2. Thus, = -( 7 r/ 4 )'^sin[( 7 r/ 4 )(l + f )], 

-1<^< 1, and the error is therefore less than 


2\3\) 


4 



646 


SOLUTIONS TO SELECTED EXERCISES 


Consequently, the error associated with OdO is less than 

77 2^[3!]"^ / TT \ ^ 

4 7 [ 6 !]^ l 4 j l 2 J 7 [ 6 !]^ 

= 0.0000117. 

12.4. (a) The inequality is obvious from the hint. 

(b) Let m = 3. Then, 



m 3 

= 0.0000411. 

(c) Apply Simpson’s method on [0, 3] with h < 0.2. Here, 

/(4)(x) = (12 - 48x^ + I6x^)e~^\ 

with a maximum of 12 at x = 0 over [0, 3]. Hence, = 12. Using 
the fact that h = ib — a)/2n, we get n = (b — a)/2h > 3/0.4 = 7.5. 
Formula (12.12), with n = 8, gives 

nM^h^ ^ 8 ( 12 ) ( 0 . 2 )^ 

90 ^ 9Ô 

= 0.00034. 

2 

Hence, the total error of approximation for Jq e~^ dx is less than 
0.0000411 + 0.00034 = 0.00038. This makes the approximation 
correct to three décimal places. 




CHAPTER 12 


647 


12.6. (a) Let u = (2t —x)/x. Then 



1 + ^x^{u + 1 )^ 



= 2xf^ 

•'-i 


1 


-1 4 +x (u + 1) 


2 



(b) Applying a Gauss-Legendre approximation of I(x) with n = 4, we 
obtain 


I(x) = 2x E 


Ù). 


=0 4 + X ( + 1) 


2 ’ 


where Uq, u^, U 2 , u^, and are the five zéros of the Legendre 
polynomial P^(u) of degree 5. Values of these zéros and the 
corresponding co/s are given below: 


^0 = 0.90617985, 
^1 = 0.53846931, 

U 2 = 0, 


^^0 = 0.23692689, 
6^1 = 0 . 47862867 , 
CO 2 ~ 0.56888889, 


W 3 = -0.53846931, 6^3 = 0.47862867, 

= -0.90617985, 6^4 = 0.23692689, 


(see Table 7.5 in Shoup, 1984, page 202). 


12.7. /o^(cos x)^dx = /q dx. Using the method of Laplace, the func- 

tion h(x) = logcos x has a single maximum in [0, 1] at x = 0. Formula 
(12.28) gives 


rigÀiogcosx^^ 

— TT 

^0 

2 A/!"( 0 ) 


ll/2 


as A ^ 00 , where 



1 


COS^ X 


x = 0 




Hence, 



648 


SOLUTIONS TO SELECTED EXERCISES 


12.9. 


ri r(i-x 2)1/2 

•^0 -^0 




djc ^ dx 


2 


TT 

~ ~6 

= 0.52359878. 


Applying the 16-point Gauss -Legendre rule to both formulas gives 
the approximate value 0.52362038. 

[Note: For more details, see Stroud (1971, Section 1.6).] 


12 . 10 . 




— n\og{\+x^) 



Apply the method of Laplace with cp(x) = 1 and 

h(x) = — log(l +x^). 

This function has a single maximum in [0, oo) at x = 0. Using formula 
(12.28), we get 



— TT 


lV2 


2nh\0) 


where h"(0)= —2. Hence, 


12 . 11 . 




Number of 
Quadrature Points 


Approximate Value 
of Variance 


2 

0.1175 

4 

0.2380 

8 

0.2341 

16 

0.1612 

32 

0.1379 

64 

0.1372 


[Source: Example 16.6.1 in Lange (1999).] 


12.12. If X > 0, then 


€>(x) 


1 rO 

l/9 'TT 


1 


. — , e ^ dt j- 
v27t /-oo v27t •'0 


/.X . 

r e~^ dt 

Jc\ 


1 

= - + 


1 


n 


TT •'0 


f e~^ dt. 

Jr\ 



CHAPTER 12 


649 


Using Maclaurin’s sériés (see Section 4.3), we obtain 


1 1 
€>(x) = - + 





1 / 


1 

?2\ 

f 

1 + 

— 

+ 



^0 

1!\ 

2 j 

2! 

2 j 


1 

( ^2 ^ 

^ 1 

ty 

; 1 

+ 

3! 


+ ••• + — 
n\ 

7^; 

1 

• 

• 

• 

+ 


dt 


1 1 
2 ^[27T 


J 


t - 


+ 


3X2X1! 5X2^X2! 7x2^x3! 


+ ••• +(-l) 


n 


t 


2n + l 


(2n + l) X2"Xtî! 


+ 


1 X 


Jo 


1 X 

2 ^/27T 


1 - 




X 


X' 


+ 


3X2X1! 5x22x2! 1X2^X3\ 


+ +(-l) 


n 


X 


2n 


(27Î + 1) X2”X/r! 


+ 


12.13. From Exercise 12.12 we need to find values of a, b, c, d such that 

( x^ x^ x^ x^ \ 

6 40 336 3456 ) 

Equating the coefficients of x^,x"^,x^,x^ on both sides yields four 
équations in a, b, c, d. Solving these équations, we get 

17 739 95 55 

(Z , Z? “ , c , d . 

468 196560 468 4368 

[Note: For more details, see Morland (1998).] 


12.14. 


y = G(x) = f 1) dt 

•'O 



0 <x < 2. 


The only solution of y = G(x) in [0, 2] is x = — 1 + (1 + 8y)^^^, 0 <y 



650 


SOLUTIONS TO SELECTED EXERCISES 


< 1. A random sample of size 150 from the g(x) distribution is given 
by 


X,.= - l + (l + 8y,.)^/^ i = 1,2,..., 150, 


where the y/s form a random sample from the Z7(0, 1) distribution. 
Applying formula (12.43), we get 


4 150 

-5 1+X; 


= 16.6572. 


Using now formula (12.36), we obtain 


A 50 “ 


2 150 


150 




= 17.8878, 


where the w/s form a random sample from the Z7(0, 2) distribution. 


12 . 15 . As ^ 00 , 


r 21 

-(n + l )/2 

r 2 1 

— n /2 

r 2 1 






1 + 

— 

1 + 


1 + 

n 


n 


n 


-1/2 


-x^ /2 


Furthermore, by applying Stirling’s formula (formula 12.31), we obtain 


/ /r + 1 ^ 


' n — 1 

in -\)/2 

2 tt 

i^n — 1^ 


1/2 

l 2 

c ^ 

2 


\ 2 


? 


/ n ^ 

^p-"2(n-2) 

n — 2 

in - 2)/2 

2 tt 

1 n — 2'^ 


1/2 

\2j 


2 


[ 2 





CHAPTER 12 


651 


Hence, 


n + 1 \ 




1 

1 

' 7 Î — 1 ' 

n -2 

' n — 1 

^niT \ 

n — 2 


n — 2 

1 

1 

( 1 -^) 

«/2 ^ 

n — 2 



^[n7T ^ 


ln-1 

n) 2 


lV2 


1 1 m 

4yÏtt e~^ V 2 

1 

^277 


Hence, for large n, 


F(x)- 



— CO 



General Bibliography 


Abramowitz, M., and I. A. Stegun (1964). Handbook of Mathematical Functions with 
Formulas, Graphs, and Mathematical Tables. Wiley, New York. 

Abramowitz, M., and L A. Stegun, eds. (1972). Handbook of Mathematical Functions 
with Formulas, Graphs, and Mathematical Tables. Wiley, New York. 

Adby, P. R., and M. A. H. Dempster (1974). Introduction to Optimization Methods. 
Chapman and Hall, London. 

Agarwal, G. G., and W. J. Studden (1978). “Asymptotic design and estimation using 
linear splines.” Comm. Statist. Simul. Comput., 7, 309-319. 

Alvo, M., and P. Cabilio (2000). “Calculation of hypergeometric probabilities using 
Chebyshev polynomials.’Mmer. Statist., 54, 141-144. 

Anderson, T. W., and S. D. Gupta (1963). “Some inequalities on characteristic roots 
of matrices.” Biometrika, 50, 522-524. 

Anderson, T. W., I. Olkin, and L. G. Underhill (1987). “Génération of random 
orthogonal matrices.” SIAM J. Sci. Statist. Comput., 8, 625-629. 

Anderson-Cook, C. M. (2000). “A second order model for cylindrical data.” J. Statist. 
Comput. Simul., 66, 51-56. 

Apostol, T. M. (1964). Mathematical Analysis. Addison-Wesley, Reading, Mas- 
sachusetts. 

Ash, A., and A. Hedayat (1978). “An introduction to design optimality with an 
OverView of the literature.” Comm. Statist. Theory Methods, 1, 1295-1325. 

Atkinson, A. C. (1982). “Developments in the design of experiments.” Internat. Statist. 
Rev., 50. 161-177. 

Atkinson, A. C. (1988). “Recent developments in the methods of optimum and 
related experimental designs.” Statist. Rev., 56, 99-115. 

Basilevsky, A. (1983). Applied Matrix Algebra in the Statistical Sciences. North-Holland, 
New York. 

Bâtes, D. M., and D. G. Watts (1988). Nonlinear Régression Analysis and its Applica- 
tions. Wiley, New York. 

Bayne, C. K., and L B. Rubin (1986). Practical Experimental Designs and Optimization 
Methods for Chemists. VCH Publishers, Deerfield Beach, Florida. 

Bellman, R. (1970). Introduction to Matrix Analysis, 2nd ed. McGraw-Hill, New York. 


652 



GENERAL BIBLIOGRAPHY 


653 


Belsley, D. A., E. Kuh, and R. E. Welsch (1980). Régression Diagnostics. Wiley, New 
York. 

Bickel, P. J., and K. A. Doksum (1977). Mathematical Statistics. Holden-Day, San 
Francisco. 

Biles, W. E., and J. J. Swain (1980). Optimization and Industrial Expérimentation. 
Wiley-Interscience, New York. 

Bloomfield, P. (1976). Fourier Analysis of Times Sériés: An Introduction. Wiley, New 
York. 

Blyth, C. R. (1990). “Minimizing the sum of absolute déviations.” Amer. Statist., 44, 
329. 

Bohachevsky, I. O., M. E. Johnson, and M. L. Stein (1986). “Generalized simulated 
annealing for function optimization.” Technometrics, 28, 209-217. 

Box, G. E. P. (1982). “Choice of response surface design and alphabetic optimality.” 
Utilitas Math., 21B, 11-55. 

Box, G. E. P., and D. W. Behnken (1960). “Some new three level designs for the study 
of quantitative variables.” Technometrics, 2, 455-475. 

Box, G. E. P., and D. R. Cox (1964). “An analysis of transformations.” /. Roy. Statist. 
Soc. Ser. B, 26, 211-243. 

Box, G. E. P., and N. R. Draper (1959). “A basis for the sélection of a response 
surface design.”/. Amer. Statist. Assoc., 54, 622-654. 

Box, G. E. P., and N. R. Draper (1963). “The choice of a second order rotatable 
design.” Biometrika, 50, 335-352. 

Box, G. E. P., and N. R. Draper (1965). “The Bayesian estimation of common 
parameters from several responses.” Biometrika, 52, 355-365. 

Box, G. E. P., and N. R. Draper (1987). Empirical Model-Building and Response 
Surfaces. Wiley, New York. 

Box, G. E. P., and H. L. Lucas (1959). “Design of experiments in nonlinear situations.” 
Biometrika, 46, 77-90. 

Box, G. E. P., and K. B. Wilson (1951). “On the experimental attainment of optimum 
conditions.” /. Roy. Statist. Soc. Ser. B, 13, 1-45. 

Box, G. E. P., W. G. Hunter, and J. S. Hunter (1978). Statistics for Experimenters. 
Wiley, New York. 

Boyer, C. B. (1968). A History of Mathematics. Wiley, New York. 

Bronshtein, I. N., and K. A. Semendyayev (1985). Handbook of Mathematics. (English 
translation edited by K. A. Hirsch.) Van Nostrand Reinhold, New York. 

Brownlee, K. A. (1965). Statistical Theory and Methodology, 2nd ed. Wiley, New York. 

Buck, R. C. (1956). Advanced Calculas. McGraw-Hill, New York. 

Bunday, B. D. (1984). Basic Optimization Methods. Edward Arnold, Victoria, Aus- 
tralia. 

Buse, A., and L. Lim (1977). “Cubic splines as a spécial case of restricted least 
squares.” /. Amer. Statist. Assoc., 12, 64-68. 

Bush, K. A., and I. Olkin (1959). “Extrema of quadratic forms with applications to 
statistics.” Biometrika, 46, 483-486. 



654 


GENERAL BIBLIOGRAPHY 


Carslaw, H. S. (1930). Introduction to the Theory of Fourier Sériés and Intégrais, 3rd ed. 
Dover, New York. 

Chatterjee, S., and B. Price (1977). Régression Analysis by Example. Wiley, New York. 

Cheney, E. W. (1982). Introduction to Approximation Theory, 2nd ed. Chelsea, New 
York. 

Chihara, T. S. (1978). An Introduction to Orthogonal Polynomials. Gordon and Breach, 
New York. 

Churchill, R. V. (1963). Fourier Sériés and Boundary Value Problems, 2nd ed. 
McGraw-Hill, New York. 

Cochran, W. G. (1963). Sampling Techniques, 2nd ed. Wiley, New York. 

Conlon, M. (1991). “The controlled random search procedure for function optimiza- 
don.” Personal communication. 

Conlon, M., and A. I. Khuri (1992). “Multiple response optimization.” Technical 
Report, Department of Statistics, University of Florida, Gainesville, Florida. 

Cook, R. D., and C. J. Nachtsheim (1980). “A comparison of algorithms for construct- 
ing exact Z)-optimal designs.” Technometrics, 22, 315-324. 

Cooke, W. P. (1988). “L’Hôpitarsrule in a Poisson dérivation.” Amer. Math. Monthly, 
95, 253-254. 

Copson, E. T. (1965). Asymptotic Expansions. Cambridge University Press, London. 

Cornish, E. A., and R. A. Fisher (1937). “Moments and cumulants in the spécification 
of distributions.” Rev. Internat. Statist. Inst., 5, 307-320. 

Corwin, L. J., and R. H. Szezarba (1982). Multivariable Calculas. Marcel Dekker, New 
York. 

Courant, R., and F. John (1965). Introduction to Calculas and Analysis, Vol. 1. Wiley, 
New York. 

Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press, 
Princeton. 

Daniels, H. (1954). “Saddlepoint approximation in statistics.” Ann. Math. Statist., 25, 
631-650. 

Dasgupta, P. (1968). “An approximation to the distribution of sample corrélation 
coefficient, when the population is non-normal.” Sankhyâ, Ser. B., 30, 425-428. 

Davis, H. F. (1963). Fourier Sériés and Orthogonal Functions. Allyn& Bacon, Boston. 

Davis, P. J. (1975). Interpolation and Approximation. Dover, New York 

Davis, P. J., and P. Rabinowitz (1975). Methods of Numerical Intégration. Academie 
Press, New York. 

De Boor, C. (1978). A Practical Guide to Splines. Springer-Verlag, New York. 

DeBmijn, N. G. (1961). Asymptotic Methods in Analysis, 2nd ed. North-Holland, 
Amesterdam. 

DeCani, J. S., and R. A. Stine (1986). “A note on deriving the information matrix for 
a logistic distribution.” Amer. Statist., 40, 220-222. 

Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). “Maximum likelihood from 
incomplète data via the EM algorithm.” /. Roy. Statist. Soc. Ser. B, 39, 1-38. 

Divgi, D. R. (1979). “Calculation of univariate and bivariate normal probability 
functions.” Statist., 1, 903-910. 



GENERAL BIBLIOGRAPHY 


655 


Draper, N. R. (1963). “Ridge analysis of response surfaces.” Technometrics, 5 , 469-479. 

Draper, N. R., and A. M. Herzberg (1987). “A ridge-regression sidelight.” Amer. 
Statist., 41 , 282-283. 

Draper, N. R., and H. Smith (1981). Applied Régression Analysis, 2nd ed. Wiley, New 
York. 

Draper, N. R., I. Guttman, and P. Lipow (1977). “All-bias designs for spline functions 
joined at the axes.”/. Amer. Statist. Assoc., 72 , 424-429. 

Dugundji, J. (1966). Topology. Allyn and Bacon, Boston. 

Durbin, J., and G. S. Watson (1950). “Testing for serial corrélation in least squares 
régression, I.” Biometrika, 37, 409-428. 

Durbin, J., and G. S. Watson (1951). “Testing for serial corrélation in least squares 
régression, II.” Biometrika, 38 , 159-178. 

Eggermont, P. P. B. (1988). “Noncentral différence quotients and the dérivative.” 
Amer. Math. Monthly, 95 , 551-553. 

Eubank, R. L. (1984). “Approximate régression models and splines.” Comm. Statist. 
Theory Methods, 13 , 433-484. 

Evans, M., and T. Swartz (1995). “Methods for approximating intégrais in statistics 
with spécial emphasis on Bayesian intégration prohlQms.’^ Statist. Sci., 10 , 254-272. 

Everitt, B. S. (1987). Introduction to Optimization Methods and Their Application in 
Statistics. Chapman and Hall, London. 

Eves, H. (1976). An Introduction to the History of Mathematics, 4th ed. Holt, Rinehart 
and Winston, New York. 

Fedorov, V. V. (1972). Theory of Optimal Experiments. Academie Press, New York. 

Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. I, 3rd 
ed. Wiley, New York. 

Fettis, H. E. (1976). “Fourier sériés expansions for Pearson Type IV distributions and 
probabilities.” SIAM J. Applied Math., 31 , 511-518. 

Fichtali, J., F. R. Van De Voort, and A. I. Khuri (1990). “Multiresponse optimization 
of acid casein production.” J. Food Process Eng., 12 , 247-258. 

Fisher, R. A., and E. Cornish (1960). “The percentile points of distribution having 
known cumulants.” Technometrics, 2, 209-225. 

Fisher, R. A., A. S. Corbet, and C. B. Williams (1943). “The relation between the 
number of species and the number of individuals in a random sample of an 
animal population.” J. Anim. Ecology, 12 , 42-58. 

Fisz, M. (1963). Probability Theory and Mathematical Statistics, 3rd ed. Wiley, New 
York. 

Fletcher, R. (1987). Practical Methods of Optimization, 2nd ed. Wiley, New York. 

Fletcher, R., and M. J. D. Powell (1963). “A rapidly convergent descent method for 
minimization.” Comput. J., 6, 163-168. 

Flournoy, N., and R. K. Tsutakawa, eds. (1991). Statistical Multiple Intégration, 
Contemporary Mathematics 115 . Amer. Math. Soc., Providence, Rhode Island. 

Freud, G. (1971). Orthogonal Polynomials. Pergamon Press, Oxford. 

Fulks, W. (1978). Advanced Calculas, 3rd ed. Wiley, New York. 

Fuller, W. A. (1976). Introduction to Statistical Time Sériés. Wiley, New York. 



656 


GENERAL BIBLIOGRAPHY 


Gallant, A. R., and W. A. Fuller (1973). “Fitting segmented polynomial régression 
models whose join points hâve to be estimated.” J. Amer. Statist. Assoc., 68, 

144-147. 

Gantmacher, F. R. (1959). The Theory of Matrices, Vols. I and IL Chelsea, New York. 

Georgiev, A. A. (1984). “Kernel estimâtes of functions and their dérivatives with 
applications.” Statist. Prob. Lett., 2, 45-50. 

Geyer, C. J. (1992). “Practical Markov chain Monte Carlo.” Statist. Sci., 7, 473-511. 

Ghazal, G. A. (1994). “Moments of the ratio of two dépendent quadratic forms.” 
Statist. Proh. Lett., 20, 313-319. 

Gibaldi, M., and D. Perrier (1982). Pharmacokinetics, 2nd ed. Dekker, New York. 

Gillespie, R. P. (1954). Partial Différentiation. Oliver and Boyd, Edinburgh. 

Gillespie, R. P. (1959). Intégration. Oliver and Boyd, London. 

Golub, G. H., and C. F. Van Loan (1983). Matrix Computations. Johns Hopkins 
University Press, Baltimore. 

Good, I. J. (1969). “Some applications of the singular décomposition of a matrix.” 
Technometrics, 11 , 823-831. 

Goutis, C., and G. Casella (1999). “Explaining the saddlepoint approximation.” 
Statist., 53, 216-224. 

Graybill, F. A. (1961). An Introduction to Linear Statistical Models, Vol. I. McGraw-Hill, 
New York. 

Graybill, F. A. (1976). Theory and Application of the Linear Model. Duxbury, North 
Scituate, Massachusetts. 

Graybill, F. A. (1983). Matrices with Applications in Statistics, 2nd ed. Wadsworth, 
Belmont, California. 

Gurland, J. (1953). “Distributions of quadratic forms and ratios of quadratic forms.” 
Ann. Math. Statist., 24 , 416-427. 

Haber, S. (1970). “Numerical évaluation of multiple intégrais.” SIAM Rev., 12 , 
481-526. 

Hall, C. A. (1968). “On error bounds for spline interpolation.” J. Approx. Theory, 1 , 
209-218. 

Hall, C. A., and W. W. Meyer (1976). “Optimal error bounds for cubic spline 
interpolation.” J. Approx. Theory, 16 , 105-122. 

Hardy, G. H. (1955). A Course of Pure Mathematics, lOth ed. The University Press, 
Cambridge, England. 

Hardy, G. H., J. E. Littlewood, and G. Pôlya (1952). Inequalities, 2nd ed. Cambridge 
University Press, Cambridge, England. 

Harris, B. (1966). Theory of Probability. Addison-Wesley, Reading, Massachusetts. 

Hartig, D. (1991). “L’Hôpital’s rule via intégration.” Amer. Math. Monthly, 98 , 
156-157. 

Hartley, H. O., and J. N. K. Rao (1967). “Maximum likelihood estimation for the 
mixed analysis of variance model.” Biometrika, 54 , 93-108. 

Healy, M. J. R. (1986). Matrices for Statistics. Clarendon Press, Oxford, England. 



GENERAL BIBLIOGRAPHY 


657 


Heiberger, R.M., P. F. Velleman, and M. A. Ypelaar (1983). “Generating test data 
with independently controllable features for multivariate general linear forms.” 
J. Amer. Statist. Assoc., 78 , 585-595. 

Henderson, H. V., and S. R. Searle (1981). “The vec-permutation matrix, the vec 
operator and Kronecker products: A review.” Linear and Multilinear Algebra, 9 , 
271-288. 

Henderson, H. V., and F. Pukelsheim, and S. R. Searle (1983). “On the history of the 
Kronecker prodncV’ Linear and MultilinearAlgebra, 14 , 113-120. 

Henle, J. M., and E. M. Kleinberg (1979). Infinitésimal Calculus. The MIT Press, 
Cambridge, Massachusetts. 

Hillier, F. S., and G. J. Lieberman (1967). Introduction to Operations Research. 
Holden-Day, San Francisco. 

Hirschman, I. L, Jr. (1962). Infinité Sériés. Holt, Rinehart and Winston, New York. 

Hoerl, A. E. (1959). “Optimum solution of many variables équations.” Chem. Eng. 
Prog., 55 , 69-78. 

Hoerl, A. E., and R. W. Kennard (1970a). “Ridge régression: Biased estimation for 
non-orthogonal problems.” Technometrics, 12 , 55-67. 

Hoerl, A. E., and R. W. Kennard (1970b). “Ridge régression: Applications to 
non-orthogonal problems.” Technometrics, 12 , 69-82; correction, 12 , 723. 

Hogg, R. V., and A. T. Craig (1965). Introduction to Mathematical Statistics, 2nd ed. 
Macmillan, New York. 

Huber, P. J. (1973). “Robust régression: Asymptotics, conjectures and Monte Carlo.” 
Ann. Statist. 1 , 799-821. 

Huber, P. J. (1981). Robust Statistics. Wiley, New York. 

Hyslop, J. M. (1954). Infinité Sériés, 5th ed. Oliver and Boyd, Edinburgh, England. 

Jackson, D. (1941). Fourier Sériés and Orthogonal Polynomials. Mathematical Associa- 
tion of America, Washington. 

James, A. T. (1954). “Normal multivariate analysis and the orthogonal gvoupl' Ann. 
Math. Statist., 25 , 40-75. 

James, A. T., and R. A. J. Conyers (1985). “Estimation of a dérivative by a différence 
quotient: Its application to hépatocyte lactate metabolism.” Biométries, 41 , 
467-476. 

Johnson, M. E., and C. J. Nachtsheim (1983). “Some guidelines for constructing exact 
D-optimal designs on convex design spaces.” Technometrics, 25 , 271-277. 

Johnson, N. L., and S. Kotz (1968). “Tables of distributions of positive definite 
quadratic forms in central normal variables.” Sankhyâ, Ser. B, 30 , 303-314. 

Johnson, N. L., and S. Kotz (1969). Discrète Distributions. Houghton Mifflin, Boston. 

Johnson, N. L., and S. Kotz (1970a). Continuons Univariate Distributions — 1. Houghton 
Mifflin, Boston. 

Johnson, N. L., and S. Kotz (1970b). Continuons Univariate Distributions — 2. Houghton 
Mifflin, Boston. 

Johnson, P. E. (1972). A History of Set Theory. Prindle, Weber, and Schmidt, Boston. 

Jones, E. R., and T. J. Mitchell (1978). “Design criteria for detecting model inade- 
quacy.” Biometrika, 65 , 541-551. 



658 


GENERAL BIBLIOGRAPHY 


Judge, G. G., W. E. Griffiths, R. C. Hill, and T. C. Lee (1980). The Theory and 
Practice of Econometrics. Wiley, New York. 

Kahaner, D. K. 1991. “A survey of existing multidimensional quadrature routines.” In 
Statistical Multiple Intégration, Contemporary Mathematics 115 , N. Flournoy and 
R. K. Tsutakawa, eds. Amer. Math. Soc., Providence, pp. 9-22. 

Kaplan, W. (1991). Advanced Calculas, 4th ed. Addison-Wesley, Redwood City, 
California. 

Kaplan, W., and D. J. Lewis (1971). Calculas and Linear Algebra, Vol. IL Wiley, New 
York. 

Karson, M. J., A. R. Manson, and R. J. Hader (1969). “Minimum bias estimation and 
experimental design for response surfaces.” Technometrics, 11 , 461-475. 

Kass, R. E., L. Tierney, and J. B. Kadane (1991). “Laplace’s method in Bayesian 
analysis.” In Statistical Multiple Intégration, Contemporary Mathematics 115 , 
N. Flournoy and R. K. Tsutakawa, eds. Amer. Math. Soc., Providence, pp. 89-99. 

Katz, D., and D. Z. D’Argenio (1983). “Experimental design for estimating intégrais 
by numerical quadrature, with applications to pharmacokinetic studies.” Biomét- 
ries, 39 , 621-628. 

Kawata, T. (1972). F ourier Analysis in Probability Theory. Academie Press, New York. 

Kendall, M., and A. Stuart (1977). The Advanced Theory of Statistics, Vol. 1, 4th ed. 
Macmillian, New York. 

Kerridge, D. F., and G. W. Cook (1976). “Yet another sériés for the normal intégral.” 
Biometrika, 63 , 401-403. 

Khuri, A. I. (1982). “Direct products: A powerful tool for the analysis of balanced 
data.” Comm. Statist. Theory Methods, 11 , 2903-2920. 

Khuri, A. I. (1984). “Interval estimation of fixed effects and of functions of variance 
components in balanced mixed models.” Sankhyâ, Ser. B, 46 , 10-28. 

Khuri, A. L, and G. Casella (2002). “The existence of the first négative moment 
revisited.” Amer. Statist., 56, 44-47. 

Khuri, A. L, and M. Conlon (1981). “Simultaneous optimization of multiple responses 
represented by polynomial régression functions.” Technometrics, 23 , 363-375. 

Khuri, A. L, and J. A. Cornell (1996). Response Surfaces, 2nd ed. Marcel Dekker, 
New York. 

Khuri, A. L, and I. J. Good (1989). “The parameterization of orthogonal matrices: A 
review mainly for statisticians.” South African Statist. /., 23, 231-250. 

Khuri, A. L, and R. H. Myers (1979). “Modified ridge analysis.” Technometrics, 21 , 
467-473. 

Khuri, A. L, and R. H. Myers (1981). “Design related robustness of tests in régression 
models.” Comm. Statist. Theory Methods, 10 , 223-235. 

Khuri, A. L, and H. Sahai (1985). “Variance components analysis: A sélective 
literature survey.” Internat. Statist. Rev, 53 , 279-300. 

Kiefer, J. (1958). “On the nonrandomized optimality and the randomized nonoptimal- 
ity of symmetrical designs.” Ann. Math. Statist., 29, 675-699. 

Kiefer, J. (1959). “Optimum experimental designs” (with discussion). /. Roy. Statist. 
Soc., Ser. B, 21 , 272-319. 



GENERAL BIBLIOGRAPHY 


659 


Kiefer, J. (1960). “Optimum experimental designs V, with applications to systematic 
and rotatable designs.” In Proceedings of the Fourth Berkeley Symposium on 
Mathematical Statistics and Probability, Vol. 1. University of California Press, 
Berkeley, pp. 381-405. 

Kiefer, J. (1961). “Optimum designs in régression problems WP Ann. Math. Statist., 
32 , 298-325. 

Kiefer, J. (1962a). “Two more criteria équivalent to Z)-optimality of designs.” Ann. 
Math. Statist., 33, 792-796. 

Kiefer, J. (1962b). “An extremum resuit.” Canad. J. Math., 14 , 597-601. 

Kiefer, J. (1975). “Optimal design: Variation in structure and performance under 
change of criterion.” Biometrika, 62 , 277-288. 

Kiefer, J., and J. Wolfowitz (1960). “The équivalence of two extremum problems.” 
Canad. J. Math., 12 , 363-366. 

Kirkpatrick, S., C. D. Gelatt, and M. P. Vechhi (1983). “Optimization by simulated 
annealing,” Science, 220, 671-680. 

Knopp, K. (1951). Theory and Application of Infinité Sériés. Blackie and Son, London. 

Kosambi, D. D. (1949). “Characteristic properties of sériés distributions.” Proc. Nat. 
Inst. Sci. India, 15 , 109-113. 

Krylov, V. I. (1962). Approximate Calculation of Intégrais. Macmillan, New York. 

Kufner, A., and J. Kadlec (1971). Fourier Sériés. Iliffe Books — The Butterworth 
Group, London. 

Kupper, L. L. (1972). “Fourier sériés and spherical harmonies régression.” Appl. 
Statist., 21 , 121-130. 

Kupper, L. L. (1973). “Minimax designs for Fourier sériés and spherical harmonies 
régressions: A characterization of rotatable arrangements.” J. Roy. Statist. Soc., 
Ser. B, 35 , 493-500. 

Lancaster, P. (1969). Theory of Matrices. Academie Press, New York. 

Lancaster, P., and K. Salkauskas (1986). Curue and Surface Fitting. Academie Press, 
London. 

Lange, K. (1999). Numerical Analysis for Statisticians. Springer, New York. 

Lehmann, E. L. (1983). Theory of Point Estimation. Wiley, New York. 

Lieberman, G. J., and Owen, D. B. (1961). Tables of the Hypergeometric Probability 
Distribution. Stanford University Press, Palo Alto, California. 

Lieberman, O. (1994). “A Laplace approximation to the moments of a ratio of 
quadratic forms.” Biometrika, 81 , 681-690. 

Lindgren, B. W. (1976). Statistical Theory, 3rd ed. Macmillan, New York. 

Little, R. J. A., and D. B. Rubin (1987). Statistical Analysis with Missing Data. Wiley, 
New York. 

Liu, Q., and D. A. Pierce (1994). “A note on Gauss-Hermite quadrature.” Biometrika, 
81 , 624-629. 

Lloyd, E. (1980). Handbook of Applicable Mathematics, Vol. IL Wiley, New York. 

Lowerre, J. M. (1982). “An introduction to modem matrix methods and statistics.” 
Amer. Statist., 36 , 113-115. 



660 


GENERAL BIBLIOGRAPHY 


Lowerre, J. M. (1983). “An intégral of the bivariate normal and an application.” 
Amer. Statist., 37, 235-236. 

Lucas, J. M. (1976). “Which response surface design is best.” Technometrics, 18, 
411-417. 

Luceno, A. (1997). “Further évidence supporting the numerical usefulness of charac- 
teristic functions.” Amer. Statist., 51, 233-234. 

Lukacs, E. (1970). Characteristic Functions, 2nd ed. Hafner, New York. 

Magnus, J. R., and H. Neudecker (1988). Matrix Dijferential Calculus with Applications 
in Statistics and Econometrics. Wiley, New York. 

Mandel, J. (1982). “Use of the singular-value décomposition in régression analysis.” 
Amer. Statist., 36, 15-24. 

Marcus, M., and H. Mine (1988). Introduction to Linear Algebra. Dover, New York. 

Mardia, K. V., and T. W. Sutton (1978). “Model for cylindrical variables with 
applications.” J. Roy. Statist. Soc., Ser. B, 40, 229-233. 

Marsaglia, G., and G. P. H. Styan (1974). “Equalities and inequalities for ranks of 
matricQS.’’ Linear and Multilinear Algebra, 2, 269-292. 

May, W. G. (1970). Linear Algebra. Scott, Foresman and Company, Glenview, Illinois. 

McCullagh, P. (1994). “Does the moment-generating function characterize a distribu- 
tion?” Amer. Statist., 48, 208. 

Menon, V. V., B. Prasad, and R. S. Singh (1984). “ Non-par ametric recursive estimâtes 
of a probability density function and its dérivatives.”/. Statist. Plann. Inference, 9, 
73-82. 

Miller, R. G., Jr. (1981). Simultaneous Statistical Inference, 2nd ed. Springer, New 
York. 

Milliken, G. A., and D. E. Johnson (1984). Analysis ofMessy Data. Lifetime Learning 
Publications, Belmont, California. 

Mitchell, T. J. (1974). “An algorithm for the construction of D-optimal experimental 
designs.” Technometrics, 16, 203-210. 

Montgomery, D. C., and E. A. Peck (1982). Introduction to Linear Régression Analysis. 
Wiley, New York. 

Moran, P. A. P. (1968). An Introduction to Probability Theory. Clarendon Press, 
Oxford. 

Morin, D. (1992). “Exact moments of ratios of quadratic forms.” Metron, 50, 59-78. 

Morland, T. (1998). “Approximations to the normal distribution function.” Math. 
Gazette, 82, 431-437. 

Morrison, D. F. (1967). Multivariate Statistical Methods. McGraw-Hill, New York. 

Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York. 

Myers, R. H. (1976). Response Surface Methodology. Author, Blacksburg, Virginia. 

Myers, R. H. (1990). Classical and Modem Régression with Applications, 2nd ed. 
PWS-Kent, Boston. 

Myers, R. H., and W. H. Carter, Jr. (1973). “Response surface techniques for dual 
response Systems.” Technometrics, 15, 301-317. 

Myers, R. H., and A. I. Khuri (1979). “A new procedure for steepest ascent.” Comm. 
Statist. Theory Methods, 8, 1359-1376. 



GENERAL BIBLIOGRAPHY 


661 


Myers, R. H., A. I. Khuri, and W. H. Carter, Jr. (1989). “Response surface methodoE 
ogy: 1966-1988.” Technometrics, 31, 137-157. 

Nelder, J. A., and R. Mead (1965). “A simplex method for function minimization.” 
Comput. 7, 308-313. 

Nelson, L. S. (1973). “A sequential simplex procedure for non-linear least-squares 
estimation and other function minimization problems.” In 21th Annual Technical 
Conférence Transactions, American Society for Quality Control, pp. 107-117. 

Newcomb, R. W. (1960). “On the simultaneous diagonalization of two semidefinite 
matrices.” Quart. AppL Math., 19, 144-146. 

Nonweiler, T. R. F. (1984). Computational Mathematics. Ellis Horwood, Chichester, 
England. 

Nurcombe, J. R, (1979). “A sequence of convergence tests.” Amer. Math. Monthly, 86, 
679-681. 

Ofir, C., and A. I. Khuri (1986). “Multicollinearity in marketing models: Diagnostics 
and remédiai measures.” Internat. J. Res. Market., 3, 181-205. 

Olkin, I. (1990). “Interface between statistics and linear algebra.” In Matrix Theory 
and Applications, Vol. 40, C. R. Johnson, ed., American Mathematical Society, 
Providence, pp. 233-256. 

Olsson, D. M., and L. S. Nelson (1975). “The Nelder-Mead simplex procedure for 
function minimization.” Technometrics, 17, 45-51. 

Otnes, R. K., and L. Enochson (1978). Applied Time Sériés Analysis. Wiley, New York. 

Park, S. H. (1978). “Experimental designs for fitting segmented polynomial régression 
models.” Technometrics, 20, 151-154. 

Parzen, E. (1962). Stochastic Processes. Holden-Day, San Francisco. 

Pazman, A. (1986). Foundations of Optimum Experimental Design. Reidel, Dordrecht, 
Holland. 

Pérez-Abreu, V. (1991). “Poisson approximation to power sériés distributions.” Amer. 
Statist., 45, 42-45. 

Pfeiffer, P. E. (1990). Probability for Applications. Springer, New York. 

Phillips, C., and B. Cornélius (1986). Computational Numerical Methods. Ellis 
Horwood, Chichester, England. 

Phillips, G. M., and P. J. Taylor (1973). Theory and Applications of Numerical Analysis. 
Academie Press, New York. 

Piegorsch, W. W., and A. J. Bailer (1993). “Minimum mean-square error quadrature.” 
J. Statist. Comput. SimuL, 46, 217-234. 

Piegorsch, W. W., and G. Casella (1985). “The existence of the first négative 
moment.” Amer. Statist., 39, 60-62. 

Pinkus, A., and S. Zafrany (1997). Fourier Sériés and Intégral Transforms. Cambridge 
University Press, Cambridge, England. 

Plackett, R. L., and J. P. Burman (1946). “The design of optimum multifactorial 
experiments.” Biometrika, 33, 305-325. 

Poirier, D. J. (1973). “Pieeewise régression using cubic splines.” /. Amer. Statist. 
Assoc., 68, 515-524. 

Powell, M. J. D. (1967). “On the maximum errors of polynomial approximations 
defined by interpolation and by least squares criteria.” Comput. J., 9, 404-407. 



662 


GENERAL BIBLIOGRAPHY 


Prenter, P. M. (1975). Splines and Variational Methods. Wiley, New York. 

Price, G. B. (1947). “Some identities in the theory of déterminants.” Amer. Math. 
Monthly, 54, 75-90. 

Price, W. L. (1977). “A controlled random search procedure for global optimization.” 
Comput. 20, 367-370. 

Pye, W. C., and P. G. Webster (1989). “A note on Raabe’s test extended.” Math. 
Comput. Ed., 23, 125-128. 

Ralston, A., and P. Rabinowitz (1978). A First Course in Numerical Analysis. 
McGraw-Hill, New York. 

Ramsay, J. O. (1988). “Monotone régression splines in action.” Statist. Sci., 3, 
425-461. 

Randles, R. H., and D. A. Wolfe (1979). Introduction to the Theory of Nonparametric 
Statistics. Wiley, New York. 

Rao, C. R. (1970). “Estimation of heteroscedastic variances in linear models.” 
J. Amer. Statist. Assoc., 65, 161-172. 

Rao, C. R. (1971). “Estimation of variance and covariance components — MINQUE 
theory.”/. Multivariate Anal., 1, 257-275. 

Rao, C. R. (1972). “Estimation of variance and covariance components in linear 
models.”/. Amer. Statist. Assoc., 67, 112-115. 

Rao, C. R. (1973). Linear Statistical Inference and Us Applications, 2nd ed. Wiley, New 
York. 

Reid, W. H., and S. J. Skates (1986). “On the asymptotic approximation of intégrais.” 
SIAMJ. Appl. Math, 46, 351-358. 

Rice, J. R. (1969). The Approximation of Functions, Vol. 2. Addison-Wesley, Reading, 
Massachusetts. 

Rivlin, T. J. (1969). An Introduction to the Approximation of Functions. Dover, New 
York. 

Rivlin, T. J. (1990). Chebyshev Polynomials, 2nd ed. Wiley, New York. 

Roberts, A. W., and D. E. Varberg (1973). Convex Functions. Academie Press, New 
York. 

Rogers, G. S. (1984). “Kronecker products in ANOVA — A first step.” Amer. Statist., 
38, 197-202. 

Roussas, G. G. (1973). A First Course in Mathematical Statistics. Addison-Wesley, 
Reading, Massachusetts. 

Rudin, W. (1964). Principles of Mathematical Analysis, 2nd ed. McGraw-Hill, New 
York. 

Rustagi, J. S., ed. (1979). Optimizing Methods in Statistics. Academie Press, New York. 

Sagan, H. (1974). Advanced Calculas. Houghton Mifflin, Boston. 

Satterthwaite, F. E. (1946). “An approximate distribution of estimâtes of variance 
components.’^ Biométries Bull., 2, 110-114. 

Scheffé, H. (1959). The Analysis of Variance. Wiley, New York. 

Schoenberg, L J. (1946). “Contributions to the problem of approximation of équidis- 
tant data by analytic functions.” Quart. Appl. Math., 4, 45-99; 112-141. 



GENERAL BIBLIOGRAPHY 


663 


Schône, A., and W. Schmid (2000). “On the joint distribution of a quadratic and a 
linear form in normal variables.” J. Mult. Anal., 72, 163-182. 

Schwartz, S. C. (1967). “Estimation of probability density by an orthogonal sériés”, 
Ann. Math. Statist., 38, 1261-1265. 

Searle, S. R. (1971). Linear Models. Wiley, New York. 

Searle, S. R. (1982). Matrix Algebra Useful for Statistics. Wiley, New York. 

Seber, G. A. F. (1984). Multivariate Observations. Wiley, New York. 

Shoup, T. E. (1984). Applied Numerical Methods for the Microcomputer. Prentice-Hall, 
Englewood Cliffs, New Jersey. 

Silvey, S. D. (1980). Optimal Designs. Chapman and Hall, London. 

Smith, D. E. (1958). History of Mathematics, Vol. I. Dover, New York. 

Smith, P. L. (1979). “Splines as a useful and convenient statistical iooV Amer. Statist., 
33, 57-62. 

Spendley, W., G. R. Hext, and F. R. Himsworth (1962). “Sequential application of 
simplex designs in optimization and evolutionary operation.” Technometrics, 4, 
441-461. 

St. John, R. C., and N. R. Draper (1975). “D-optimality for régression designs: A 
review.” Technometrics, 17, 15-23. 

Stark, P. A. (1970). Introduction to Numerical Methods. Macmillan, London. 

Stoll, R. R. (1963). Set Theory and Logic. W. H. Freeman, San Francisco. 

Strawderman, R. L. (2000). “Higher-order asymptotic approximation: Laplace, sad- 
dlepoint, and related methods.”/. Amer. Statist. Assoc., 95, 1358-1364. 

Stroud, A. H. (1971). Approximate Calculation of Multiple Intégrais. Prentice-Hall, 
Englewood Cliffs, New Jersey. 

Subrahmaniam, K. (1966). “Some contributions to the theory of non-normality — I 
(univariate case).” Sankhyâ, Ser. A, 28, 389-406. 

Sutradhar, B. C., and R. F. Bartlett (1989). “An approximation to the distribution of 
the ratio of two general quadratic forms with application to time sériés valued 
designs.” Comm. Statist. Theory Methods, 18, 1563-1588. 

Swallow, W. H., and S. R. Searle (1978). “Minimum variance quadratic unbiased 
estimation (MIVQUE) of variance components.” Technometrics, 20, 265-272. 

Szegô, G. (1975). Orthogonal Polynomials, 4th ed. Amer. Math. Soc., Providence, 
Rhode Island. 

Szidarovszky, F., and S. Yakowitz (1978). Principles and Procedures of Numerical 
Analysis. Plénum Press, New York. 

Taylor, A. E., and W. R. Mann (1972). Advanced Calculas, 2nd ed. Wiley, New York. 

Thibaudeau, Y., and G. P. H. Styan (1985). “Bounds for Chakrabarti’s measure of 
imbalance in experimental design.” In Proceedings of the First International 
Tampere Seminar on Linear Statistical Models and Their Applications, T. Pukkila 
and S. Puntanen, eds. University of Tampere, Tampere, Finland, pp. 323-347. 

Thiele, T. N. (1903). Theory of Observations. Layton, London. Reprinted in Ann. 
Math. Statist. (1931), 2, 165-307. 



664 


GENERAL BIBLIOGRAPHY 


Tierney, L., R. E. Kass, and J. B. Kadane (1989). “Fully exponential Laplace 
approximations to expectations and variances of nonpositive functions.” /. Amer. 
Statist. Assoc., 84 , 710-716. 

Tiku, M. L. (1964a). “Approximating the general non-normal variance ratio sampling 
distributions.” Biometrika, 51 , 83-95. 

Tiku, M. L. (1964b). “A note on the négative moments of a truncated Poisson 
variate.”/. Amer. Statist. Assoc., 59 , 1220-1224. 

Tolstov, G. P. (1962). Fourier Sériés. Dover, New York. (Translated from the Russian 
by Richard A. Silverman.) 

Tucker, H. G., (1962). Probability and Mathematical Statistics. Academie Press, New 
York. 

Vilenkin, N. Y. (1968). Stories about Sets. Academie Press, New York. 

Viskov, O. V. (1992). “Some remarks on Hermite polynomials.” Theory Prob. AppL, 
36 , 633-637. 

Waller, L. A. (1995). “Does the characteristic function numerically distinguish distri- 
butions?” ylmer. Statist., 49 , 150-152. 

Waller, L. A., B. W. Turnbull, and J. M. Hardin (1995). “Obtaining distribution 
functions by numerical inversion of characteristic functions with applications.” 
Amer. Statist., 49 , 346-350. 

Watson, G. S. (1964). “A note on maximum likelihood.” Sankhyâ, Ser. A, 26 , 
303-304. 

Weaver, H. J. (1989). Theory of Discrète and Continuons Fourier Analysis. Wiley, New 
York. 

Wegman, E. J., and I. W. Wright (1983). “Splines in statistics.” J. Amer. Statist. 
Assoc., 78 , 351-365. 

Wen, L. (2001). “A counterexample for the two-dimensional density function.” 

Math. Monthly, 108 , 367-368. 

Wetherill, G. B., P. Duncombe, M. Kenward, J. Kôllerstrôm, S. R. Paul, and B. J. 
Vowden (1986). Régression Analysis with Applications. Chapman and Hall, 
London. 

Wilks, S. S. (1962). Mathematical Statistics. Wiley, New York. 

Withers, C. S. (1984). “Asymptotic expansions for distributions and quantités with 
power sériés cumulants.” /. Roy. Statist. Soc., Ser. B, 46 , 389-396. 

Wold, S. (1974). “Spline functions in data analysis.” Technometrics, 16 , 1-11. 

Wolkowicz, H., and G. P. H. Styan (1980). “Bounds for eigenvalues using traces.” 
Linear Algebra AppL, 29 , 471-506. 

Wong, R. (1989). Asymptotic Approximations of Intégrais. Academie Press, New York. 

Woods, J. D., and H. O. Posten (1977). “The use of Fourier sériés in the évaluation of 
probability distribution functions.” Comm. Statist. — Simul. Comput., 6, 201-219. 

Wynn, H. P. (1970). “The sequential génération of £>-optimum experimental designs.” 
Ann. Math. Statist., 41 , 1655-1664. 

Wynn, H. P. (1972). “Results in the theory and construction of D-optimum experi- 
mental designs.” /. Roy. Statist. Soc., Ser. B, 34 , 133-147. 

Zanakis, S. H., and J. S. Rustagi, eds. (1982). Optimization in Statistics. North-Holland, 
Amsterdam. 

Zaring, W. M. (1967). An Introduction to Analysis. Macmillian, New York. 



Index 


Absolute error loss, 86 
Absolutely continuous 
function, 91 

random variable, 116, 239 
Addition of matrices, 29 
Adjoint, 34 
Adjugate, 34 
Admissible estimator, 86 
Approximation of 
density functions, 456 
functions, 403, 495 
intégrais, 517, 533 
normal intégral, 460, 469 
quantiles of distributions, 456, 458, 468 
Arcsin x, 6, 78 
Asymptotically equal, 66, 147 
Asymptotic distributions, 120 
Average squared bias, 360, 431 
Average variance, 360, 400 

Balanced mixed model, 43 
Behrens-Fisher test, 324 
Bernoulli sequence, 14 
Bernstein polynomial, 406, 410 
Best linear unbiased estimator, 312 
Beta function, 256 
Bias, 359, 360 
criterion, 359, 366 
in the model, 359, 366 
Biased estimator, 196 
Binomial distribution, 194 
Binomial random variable, 14, 203 
Bolzano-Weierstrass theorem, 11 
Borel field, 13 
Boundary, 262, 297 
point, 262 

Bounded sequence, 132, 262 
Bounded set, 9, 13, 72, 262 


Bounded variation, function of, 210, 212 
Bounds on eigenvalues, 41 
Box and Cox transformation, 120 
Box-Draper déterminant criterion, 400 
Box-Lucas criterion, 368, 422, 424-425 

Calibration, 242 

Canonical corrélation coefficient, 56 

Cartesian product, 3, 21 

Cauchy criterion, 137, 221 

Cauchy distribution, 240 

Cauchy-Schwarz inequality, 25, 229 

Cauchy sequence, 138, 222 

Cauchy’s condensation test, 153 

Cauchy’s mean value theorem, 103, 105, 129 

Cauchy’s product, 163 

Cauchy’s test, 149 

Cell, 294 

Center-point réplications, 400 
Central limit theorem, 120, 516 
Chain rule, 97, 116 
Change of variables, 219, 299 
Characteristic équation, 37 
Characteristic function, 505 
inversion formula for, 507 
Characteristic root, 36 
Characteristic vector, 37 
Chebyshev polynomials, 415, 444 
discrète, 462 
of the first kind, 444 
of the second kind, 445 
zéros of, 415, 444 

Chebyshev’s inequality, 19, 184, 251, 259 
Chi-squared random variable, 246 
Christoffel’s identity, 443 
Closed set, 11-13, 72, 262 
Coded levels, 347 
Coefficient of variation, 242 


665 



666 


INDEX 


Cofactor, 31, 357 
Compact set, 13 
Comparison test, 144 
Complément of a set, 2 
relative, 2 

Composite function, 6, 97, 265, 276 
Concave function, 79 
Condition number, 47 

Confidence intervals, simultaneous, 382, 384, 
388 

Consistent estimator, 83 
Constrained maximization, 329 
Constrained optimization, 288 
Constraints, equality, 288 
Continuons distribution, 14, 82 
Continuons function, 66, 210, 264 
absolutely, 91 
left, 68 

piecewise, 481, 486, 489, 498 
right, 68, 83 

Continuons random variable, 14, 82, 116, 239 
Contrast, 384 

Controlled random search procedure, 336, 426 
Convergence 

almost surely, 192 
in distribution, 120 
in probability, 83, 191 
in quadratic mean, 191 
Convex function, 79, 84, 98 
Convex set, 79 
Convolution, 499 

Cornish-Fisher expansion, 460, 468 
Corrélation coefficient, 311 
Countable set, 6 
Covering of a set, 12 
open, 12 

Cubic spline régression model, 430 
Cumulant generating function, 189 
Cumulants, 189, 459 

Cumulative distribution function, 14, 82, 304 
joint, 14 

Curvature, in response surface, 342 
Cylindrical data, 505 

d’Alembert’s test, 148 
Davidon-Fletcher-Powell method, 331 
Deleted neighborhood, 57, 262 
Density function, 14, 116, 239, 304 
approximation of, 456 
bivariate, 305 
marginal, 304 
Dérivative, 93 
directional, 273 


partial, 267 
total, 270 
Design 

A-optimal, 365 
approximate, 363 
Box-Behnken, 359 
center, 344, 356 

central composite, 347, 353, 358, 400 
dependence of, 368-369, 422 
D-optimal, 363, 365, 369 
efficiency of, 365 
F-optimal, 365 
exact, 363 

first-order, 340, 356, 358 
G-optimal, 364 
Aj-optimal, 399 
matrix, 356 

measure, 362-363, 365 
moments, 362 
nonlinear model, 367 
optimal, 362, 425 
orthogonal, 358 
parameter-free, 426 
Plackett-Burman, 358 
points, 353, 360 
response surface, 339, 355, 359 
rotatable, 350, 356 
second-order, 344, 358 
3^ factorial, 358 
2^ factorial, 340, 358 
Design measure 
continuous, 362 
discrète, 363, 366 
Déterminant, 31 
Hessian, 276 
Jacobian, 268 
Vandermonde’s, 411 
DETMAX algorithm, 366 
Diagonalization of a matrix, 38 
Différentiable function, 94, 113 
Differential operator, 277 
Différentiation under the intégral sign, 301 
Directional dérivative, 273 
Direct product, 30, 44 
Direct search methods, 332 
Direct sum 
of matrices, 30 
of vector subspaces, 25 
Discontinuity, 67 
first kind, 67 
second kind, 67 
Discrète distribution, 14 
Discrète random variable, 14, 182 



INDEX 


667 


Disjoint sets, 2 
Distribution 
binomial, 194 
Cauchy, 240 
chi-squared, 245 
continuous, 14, 82 
discrète, 14 

exponential, 20, 91, 131 
F, 383-384 
gamma, 251 
hypergeometric, 463 
logarithmic sériés, 193 
logistic, 241 
marginal, 304 
multinomial, 325 
négative binomial, 182, 195 
normal, 120, 306, 311, 400 
Pearson Type IV, 503 
Poisson, 119, 183, 204 
power sériés, 193 
t, 259 

truncated Poisson, 373 
uniform, 83, 92, 116, 250 
Domain of a function, 5 
Dot product, 23 

Eigenvalues, 36-37 
Eigenvectors, 37 
EM algorithm, 372, 375 
Empty set, 1 
Equal in distribution, 15 
Equality of matrices, 28 
Equivalence class, 4 
Equivalence theorem, 365 
Equivalent sets, 5 
Error loss 
absolute, 86 
squared, 86 

Estimable linear function, 45, 382, 387 
Estimation of unknown densities, 461 
Estimator 

admissible, 86 

best linear unbiased, 312 

biased, 196 

consistent, 83 

least absolute value, 327 

M, 327 

minimax, 86 

minimum norm quadratic unbiased, 380 
minimum variance quadratic unbiased, 382 
quadratic, 379 
ridge, 56, 196 

Euclidean norm, 24, 42, 179, 261 


Euclidean space, 21 
Euler’s theorem, 317 
Event, 13 

Exchangeable random variables, 15 
Exponential distribution, 20, 91, 131 
Extrema 

absolute, 113 
local, 113 

Factorial moment, 191 
Failure rate, 130 
F distribution, 383-384 
Fisher’s method of scoring, 374 
Fixed effects, 44 
Fourier 

coefficients, 472 
intégral, 488 
transform, 497, 507 
Fourier sériés, 471 
convergence of, 475 
différentiation of, 483 
intégration of, 483 
Fubini’s theorem, 297 
Function, 5 

absolutely continuous, 91 
bounded, 65, 72 
bounded variation, 210, 212 
composite, 6, 97, 265, 276 
concave, 79 

continuous, 67, 210, 264 
convex, 79, 84, 98 
différentiable, 94, 113 
homogeneous, 317 
implicit, 273 
inverse, 6, 76, 102 
left continuous, 68 
limit of, 57 

Lipschitz continuous, 75, 409 
loss, 86 

lower semicontinuous, 90 
monotone, 76, 101, 116, 210 
multivariable, 261 
one-to-one, 5 
onto, 5 

periodic, 473, 489, 504 
Riemann intégrable, 206, 210, 215 
Riemann-Stieltjes intégrable, 234 
right continuous, 68, 83 
risk, 86 

uniformly continuous, 68, 74, 82, 265, 
407 

upper semicontinuous, 90 
Functionally dépendent, 318 



668 


INDEX 


Function of a random variable, 116 
Fundamental theorem of calculus, 219 

Gamma distribution, 251 
Gamma function, 246, 532 
approximation for, 532 
Gaussian quadrature, 524 
Gauss-Markov theorem, 55 
Gauss’s test, 156 

Generalized distance approach, 370 
Generalized inverse, 36 
Generalized simulated annealing method, 
Generalized variance, 363 
Géométrie mean, 84, 230 
Géométrie sériés, 142 
Gibbs phenomenon, 514 
Gradient, 275 
methods, 329 
Gram-Charlier sériés, 457 
Gram-Schmidt orthonormalization, 24 
Greatest-integer function, 238 
Greatest lower bound, 9, 72 

Harmonie 

frequencies, 501 
mean, 85 
sériés, 145 
Hazard rate, 131 
Heine-Borel theorem, 13 
Hermite polynomials, 447 
applications of, 456, 542 
normalized, 461 
Hessian 

déterminant, 276 
matrix, 275, 285, 374 
Heteroscedasticity, 118 
Holder’s inequality, 230-231 
Homogeneous function, 317 
Euler’s theorem for, 317 
Homoscedasticity, 118 
Hotelling’s statistic, 49 
Householder matrix, 38 
Hypergeometric 
distribution, 463 
probability, 462 
sériés, 157 

Idempotent matrix, 38 
111 conditioning, 46 
Image of a set, 5 
Implicit function, 273 
Implicit function theorem, 282 
Importance sampling, 537 


Improper intégral, 220 

absolutely convergent, 222, 226 
conditionally convergent, 222, 226 
of the first kind, 220, 528 
of the second kind, 221, 225 
Indefinite intégral, 217 
Indeterminate form, 107 
Inequality 

Cauchy-Schwarz, 25, 229 
Chebyshev’s, 19, 184, 251 
Hôlder’s, 230, 231 
338 Jensen’s, 84, 233 
Markov’s, 19, 185 
Minkowski’s, 232 
Infimum, 9, 73 
Inner product, 23 
Input variables, 339, 351 
Intégral 

improper, 220 
indefinite, 217 
Riemann, 206, 293 
Riemann-Stieltjes, 234 
Intégral test, 153, 224 
Interaction, 43 
Interior point, 262 
Intermediate-value theorem, 71, 217 
Interpolation, 410 
error, 414, 416, 423 
Lagrange, 411, 419, 422 
points, 411-412, 414, 423 
Intersection, 2 

Interval of convergence, 174, 190 
Inverse function, 6, 76, 102 
Inverse function theorem, 280, 305 
Inverse of a matrix, 34 
Inverse régression, 242 
Irrational number, 9 

Jacobian 

déterminant, 268 
matrix, 268 

Jacobi polynomials, 443 
applications of, 462 
Jensen’s inequality, 84, 233 
Jordan content, 298 
Jordan measurable, 298 

Kernel of a linear transformation, 26 
Khinchine’s theorem, 192 
Knots, 418, 431 
Kolmogorov’s theorem, 192 
Kronecker product, 30 
Kummer’s test, 154 



INDEX 


669 


Lack of fit, 342, 399 

Lagrange interpolating polynomial, 411, 416, 
423, 518, 521 

Lagrange interpolation, 413, 419, 422 
accuracy of, 413 

Lagrange multipliers, 288, 312, 329, 341, 344, 
352, 383 

Laguerre polynomials, 451 
applications of, 462 
Laplace 

approximation, 531, 546, 548 
method of, 531, 534, 546 
Law of large numbers, 84 
Bernoulli’s, 204 
strong, 192 
weak, 192 

Least-squares estimate, 46, 344 
Least-squares polynomial approximation, 
453-455 

Least upper bound, 9, 73 
Legendre polynomials, 440, 442 
Leibniz’s formula, 127, 443, 449, 452 
Level of significance, 15 
L’Hospitaks rule, 103, 120, 254 
Likelihood 
équations, 308, 373 
function, 308, 373 
Limit, 57 
left sided, 59 
lower, 136 

of a multivariable function, 262 
one sided, 58 
right sided, 59 
of a sequence, 133 
subsequential, 136 
two sided, 58 
upper, 136 
Limit point, 11, 262 
Linearly dépendent, 22 
Linearly independent, 23 
Linear model, 43, 46, 121, 195, 355, 367, 

382 

balanced, 43 
first order, 340 
second order, 343 
Linear span, 23 
Linear transformation, 25 
kernel of, 26 
one-to-one, 27 
Lipschitz condition, 75 
Lipschitz continuons function, 75, 409 
Logarithmic sériés distribution, 193 
Logistic distribution, 241 


Loss function, 86 
Lower limit, 136 

Lower semicontinuous function, 90 
Lower sum, 206, 294 

Maclaurin’s intégral test, 153 
Maclaurin’s sériés, 111 
Mapping, 5 

Markov chain Monte Carlo, 549 
Markov’s inequality, 19, 185 
Matrix, 28 
diagonal, 28 
full column rank, 34 
full rank, 34 
full row rank, 34 
Hessian, 275, 285, 374 
Householder, 38 
idempotent, 38 
identity, 28 
inverse of, 34 
Jacobian, 268 
moment, 363 
orthogonal, 38, 351, 357 
partitioned, 30 
rank of, 33 
singular, 32, 37 
skew symmetric, 29 
square, 28 
symmetric, 29 
trace of, 29 
transpose of, 29 
Maximum 

absolute, 113, 283, 346 
local, 113, 283 
relative, 113 

Maximum likelihood estimate, 308, 372 
Mean, 117, 239 

Mean response, 343, 359, 367, 423 
maximum of, 343 
Mean squared error, 360, 431 
integrated, 360, 431 
Mean value theorem, 99-100, 271 
for intégrais, 217 
Médian, 124 
Mertens’s theorem, 163 
Method of least squares, 121 
Minimax estimator, 86 
Minimization of prédiction variance, 356 
Minimum 

absolute, 113, 283, 336, 346 
local, 113, 283 
relative, 113 

Minimum bias estimation, 398 



670 


INDEX 


Minimum norm quadratic unbiased estimation, 
327, 378 

Minkowski’s inequality, 232 
Minor, 31 

leading principal, 32, 40, 285 
principal, 32, 292 
Modal approximation, 549 
Modulus of continuity, 407 
Moment generating function, 186, 250, 506 
Moment matrix, 363 
Moments, 182, 505 
central, 182, 240, 457 
design, 362 
factorial, 191 
first négative, 242 
noncentral, 182, 240 
région, 361 

Monotone function, 76, 101, 210 
Monotone sequence, 133 
Monte Carlo method, 535, 540 
error bound for, 537 
variance réduction, 537 
Monte Carlo simulation, 83 
Multicollinearity, 46, 196, 353, 387 
Multimodal function, 336, 338 
Multinomial distribution, 325 
Multiresponse 
experiment, 370 
function, 370 
optimization, 370 
Multivariable function, 261 
composite, 265, 276 
continuons, 264 
inverse of, 280 
iimit of, 262 
optima of, 283 
partial dérivative of, 267 
Riemann intégral of, 293 
Taylor’s theorem for, 277 
uniformly continuons, 265 

Négative binomial, 182, 195 
Neighborhood, 10, 262 
deleted, 57, 262 
Newton-Cotes methods, 523 
Newton-Raphson method, 331, 373 
Noncentral chi-squared, 44 
Noncentrality parameter, 44 
Nonlinear model, 367 
approximate linearization of, 422 
design for, 367 
Nonlinear parameter, 367 
Norm, 179 


Euclidean, 24, 42, 179, 261 
minimum, 380 
of a partition, 205 
spectral, 179 

Normal distribution, 120, 306 
bivariate, 311 
p-variate, 400 
Normal intégral, 460, 469 
approximation of, 460, 469 
Normal random variable, 245 
Null space, 26 

One-to-one correspondence, 5 
One-to-one function, 5 
One-way classification model, 387 
O notation, 66, 117 
O notation, 65, 156 
Open set, 10, 262 
Optimality of design 
A, 365 

D, 364, 367, 431 
£, 365 
G, 364 

Optimization techniques, 339 
Optimum 
absolute, 113 
compromise, 371 
conditions, 370 
idéal, 371 
local, 113, 284 
Ordered n-tuple, 4 
Ordered pair, 3 
Orthogonal complément, 25 
Orthogonal design, 358 
Orthogonal functions, 437 
Orthogonal matrix, 38, 351, 357 
parameterization, 49 
Orthogonal polynomials, 437, 453 
Chebyshev, 444 
of the first kind, 444 
of the second kind, 445 
zéros of, 444 
on a finite set, 455 
Hermite, 447 
applications of, 456 
Jacobi, 443 
Laguerre, 451 

least-squares approximation with, 453 
Legendre, 440, 442 
sequence of, 437 
zéros of, 440 
Orthogonal vectors, 24 
Orthonormal basis, 24 



INDEX 


671 


Parameterization of orthogonal matrices, 49 
ParsevaPs theorem, 496 
Partial dérivative, 267 
Partially nonlinear model, 369 
Partial sum, 140 
Partition, 205, 294 
of a set, 5 
Periodic 

extension, 482 
function, 473, 489, 504 
Periodogram, 501 

Piecewise continuons function, 481, 486, 489, 
498 

Poisson approximation, 194 
Poisson distribution, 119, 183, 204 
truncated, 373 
Poisson process, 122, 131 
Poisson random variable, 14, 124, 183 
Polynomial 

approximation, 403 
Bernstein, 406, 410 
Chebyshev, 444 
Hermite, 447, 456, 542 
interpolation, 410 
Jacobi, 443, 462 

Lagrange interpolating, 411, 416, 423 
Laguerre, 451, 462 
Legendre, 440, 442 
piecewise, 418 
trigonométrie, 495, 500, 504 
Power sériés, 174 
distribution, 194 
Prédiction équation, 348 
Prédiction variance, 347, 350, 352, 356 
minimization of, 356 
standardized, 363-364 
Principal components, 328 
Probability of an event, 13 
Probability generating function, 190 
continuity theorem for, 192 
Probability space, 13 
Product of matrices, 29 
Projection of a vector, 25 
Proper subset, 1 

Quadratic form, 39 
extrema of, 48 
nonnegative definite, 40 
positive definite, 40 
positive semidefinite, 40 
Quadrature, 524 
Gauss-Chebyshev, 526 
Gauss-Hermite, 528, 542 


Gauss-Jacobi, 526 
Gauss-Laguerre, 528 
Gauss-Legendre, 526 

Raabe’s test, 155 
Radius of convergence, 174 
Random effects, 44 
Random variable, 13 
absolutely continuons, 116, 239 
Bernoulli, 194 
binomial, 14, 203 
chi-squared, 246 
continuons, 14, 82, 116, 239 
discrète, 14, 182 
function of, 116 
normal, 245 
Poisson, 14, 124, 183 
Random vector, 304 
transformations of, 305 
Range of a function, 5 
Rank of a matrix, 33 
Rational number, 8 
Ratio test, 148 
Rayleigh’s quotient, 41 
infimum of, 41 
supremum of, 41 
Rearrangement of sériés, 159 
Refinement, 206, 214, 294 
Région of convergence, 174 
Relation, 4 
congruence, 5 
équivalence, 4 
reflexive, 4 
symmetric, 4 
transitive, 4 

Remainder, 110, 257, 279 
Response 

constrained, 352 
maximum, 340 
minimum, 340 
optimum, 350 

predicted, 341, 344, 355, 504 
primary, 352 
surface, 340 

Response surface design, 339, 355, 359 
Response surface methodology, 339, 355, 367, 
504 

Response variable, 121 
Restricted least squares, 429 
Ridge analysis, 343, 349, 352 
modified, 350, 354 
standard, 354 
Ridge estimator, 56, 196 



672 


INDEX 


Ridge plots, 346 
Ridge régression, 195 
Riemann intégral, 206, 293 
double, 295 
iterated, 295 
n-tuple, 295 

Riemann-Stieltjes intégral, 234 
Risk function, 86 

Rodrigues formula, 440, 443, 447, 451 
Rolle’s theorem, 99, 110 
Root test, 149 
Rotatability, 347 

Saddle point, 114, 284, 344, 349 
Saddlepoint approximation, 549 
Sample, 14 
Sample space, 13 
Sample variance, 14 
Satterthwaite’s approximation, 324 
Scalar, 21 
product, 23 

Scheffé’s confidence intervals, 382, 384 
Schur’s theorem, 42 
Sequence, 132, 262 
bounded, 132, 262 
Cauchy, 138, 222 
convergent, 133, 262 
divergent, 133 
of fonctions, 165, 227 
limit of, 133 

limit infimum of, 136, 148, 151 
limit supremum of, 136, 148, 151 
of matrices, 178 
monotone, 133 

of orthogonal polynomials, 437 
subsequential limit of, 136 
uniformly convergent, 166, 169, 228 
Sériés, 140 

absolutely convergent, 143, 158, 174 

alternating, 158 

conditionally convergent, 158 

convergent, 141 

divergent, 141 

Fourier, 471 

of fonctions, 165 

géométrie, 142 

Gram-Charlier, 457 

harmonie, 145 

hypergeometric, 157 

Maclaurin’s, 111 

of matrices, 178, 180 

multiplication of, 162 

of positive terms, 144 

power, 174 


rearrangement of, 159 
sum of, 140 
TayloFs, 111, 121 
trigonométrie, 471 
uniformly convergent, 166, 169 
Set(s), 1 

bounded, 9, 72, 262 
closed, 11, 72, 262 
compact, 13 
complément of, 2 
connected, 12 
convex, 79 
countable, 6 
covering of, 12 
disconnected, 12 
empty, 1 
finite, 6 
image of, 5 
open, 10, 262 
partition of, 5 
subcovering of, 13 
uncountable, 7 
universal, 2 

Sherman-Morrison formula, 54 
Sherman-Morrison-Woodbury formula, 54, 
389 

Simplex method, 332 
Simpson’s method, 521 
Simulated annealing, 338 
Simultaneous diagonalization, 40 
Singularity of a function, 225 
Singular-value décomposition, 39, 45 
Singular values of a matrix, 46 
Size of test, 15 

Spectral décomposition theorem, 38, 181, 351 
Spherical polar coordinates, 322 
Spline functions, 418, 428 
approximation, 418, 421 
cubic, 419, 428 
designs for, 430 
error of approximation by, 421 
linear, 418 
properties of, 418 
Squared error loss, 86 
Stationary point, 284, 345 
Stationary value, 113 
Statistic, 14 
first-order, 20 
Hotelling’s 49 
«th-order, 20 
Steepest ascent 
method of, 330, 340 
path of, 396 
Steepest descent 



INDEX 


673 


method of, 329 
path of, 343 
Step function, 237 
Stieltjes moment problem, 185 
Stirling’s formula, 532 
Stratified sampling, 313 
Stratum, 313 
Submatrix, 29 
leading principal, 29 
principal, 29 
Subsequence, 135, 262 
Subsequential limit, 136 
Subset, 1 
Supremum, 9, 73 
Symmetric différence, 17 

Taylor’s formula, 110 
Taylor’s sériés, 111, 121 
remainder of, 110, 257, 279 
Taylor’s theorem, 108, 114, 277 
t distribution, 259 
Time sériés, 500 
Topological space, 10 
Topology, 10 
basis for, 10 
Total dérivative, 270 
Transformation 
Box and Cox, 120 

of continuons random variables, 246 
of random vectors, 305 
variance stabilizing, 118 
Translation invariance, 379 
Trapezoidal method, 517 
Triangle inequality, 25 


Trigonométrie 
polynomial(s), 495, 500, 504 
sériés, 471 

Two-way crossed classification model, 43, 401 

Uncountable set, 7 
Uniform convergence 

of sequences, 166, 169, 228 
of sériés, 167, 169 

Uniform distribution, 83, 92, 116, 250 
Uniformly continuons function, 68, 74, 82, 265, 
407 
Union, 2 
Universal set, 2 
Upper limit, 136 

Upper semicontinuous function, 90 
Upper sum, 206, 294 

Variance, 117, 239 
Variance components, 44, 378 
Variance criterion, 359, 366 
Variance stabilizing transformation, 118 
Vector, 21 
column, 28 
row, 28 
zéro, 21 

Vector space, 21 
basis for, 23 
dimension of, 23 
Vector subspace, 22 
Voronovsky’s theorem, 410 

Weierstrass approximation theorem, 403 
Weierstrass’s M-test, 167, 172 



WILEY SERIES IN PROBABILITY AND STATISTICS 

Established by WALTER A. SHEWHART and SAMUEL S. WILKS 

Editors: David J. Balding, Peter Bloomfield, Noël A. C. Cressie, 
Nicholas I. Fisher, lain M. Johnstone, J. B. Kadane, Louise M. Ryan, 
David W Scott, Adrian F. M. Smith, Jozef L. Teugeîs 
Editors Emeriti: Vie Barnett, J. Stuart Hunter, David G. Kendall 


A complété list of the titles in this sériés appears at the end of this volume. 



WILEY SERIES IN PROBABILITY AND STATISTICS 

ESTABLISHED BY WALTER A. ShEWHART AND SAMUEL S. WiLKS 


Editors: David J. Baîding, Peter Bloomfield, Noël A. C. Cressie, 
Nicholas L Fisher, lain M. Johnstone, J. B. Kadane, Louise M. Ryan, 
David W. Scott, Adrian F. M. Smith, Jozef L. Teugels 
Editors Emeriti: Vie Barnett, J. Stuart Hunter, David G. Kendall 


The Wiley Sériés in Probability and Statistics is well established and authoritative. It covers 
many topics of current research interest in both pure and applied statistics and probability 
theory. Written by leading statisticians and institutions, the titles span both state-of-the-art 
developments in the fleld and classical methods. 

Reflecting the wide range of current research in statistics, the sériés encompasses applied, 
methodological and theoretical statistics, ranging from applications and new techniques 
made possible by advances in computerized practice to rigorous treatment of theoretical 
approaches. 

This sériés provides essential and invaluable reading for ail statisticians, whether in aca- 
demia, industry, govemment, or research. 


ABRAHAM and LEDOLTER ■ Statistical Methods for Forecasting 
AGRESTI ■ Analysis of Ordinal Categorical Data 
AGRESTI * An Introduction to Categorical Data Analysis 
AGRESTI * Categorical Data Analysis, Second Edition 
ANDÉL ■ Mathematics of Chance 

ANDERSON • An Introduction to Multivariate Statistical Analysis, Second Edition 
*ANDERSON ■ The Statistical Analysis of Time Sériés 
ANDERSON, AUQUIER, HAUCK, OAKES, VANDAELE, and WEISBERG • 

Statistical Methods for Comparative Studies 
ANDERSON and LOYNES ■ The Teaching of Practical Statistics 
ARMITAGE and DAVID (editors) ■ Advances in Biometry 
ARNOLD, BALAKRISHNAN, and NAGARAJA • Records 
*ARTHANARI and DODGE ■ Mathematical Programming in Statistics 
*BAILEY • The Eléments of Stochastic Processes with Applications to the Natural 
Sciences 

BALAKRISHNAN and KOUTRAS • Runs and Scans with Applications 
BARNETT • Comparative Statistical Inference, Third Edition 
BARNETT and LEWIS ■ Outliers in Statistical Data, Third Edition 
BARTOSZYNSKI and NIEWIADOMSKA-BUGAJ ■ Probability and Statistical Inference 
BASILEVSKY ■ Statistical Factor Analysis and Related Methods; Theory and 
Applications 

BASU and RIGDON • Statistical Methods for the Reliability of Repairable Systems 
BATES and WATTS • Nonlinear Régression Analysis and Its Applications 
BECHHOFER, SANTNER, and GOLDSMAN ■ Design and Analysis of Experiments for 
Statistical Sélection, Screening, and Multiple Comparisons 
BELSLEY • Conditioning Diagnostics: Collinearity and Weak Data in Régression 
BELSLEY, KUH, and WELSCH * Régression Diagnostics: Identifying Influential 
Data and Sources of Collinearity 

BENDAT and PIERSOL * Random Data: Analysis and Measurement Procedures, 

Third Edition 


*Now available in a lower priced paperback édition in the Wiley Classics Library. 



BERRY, CHALONER, and GEWEKE * Bayesian Analysis in Statistics and 
Econometrics: Essays in Honor of Arnold Zellner 
BERNARDO and SMITH * Bayesian Theory 

BHAT and MILLER • Eléments of Applied Stochastic Processes, Third Edition 

BHATTACHARYA and JOHNSON • Statistical Concepts and Methods 

BHATTACHARYA and WAYMIRE ■ Stochastic Processes with Applications 

BILLINGSLEY * Convergence of Probability Measures, Second Edition 

BILLINGSLEY ■ Probability and Measure, Third Edition 

BIRKES and DODGE ■ Alternative Methods of Régression 

BLISCHKE AND MURTHY • Reliability: Modeling, Prédiction, and Optimization 

BLOOMFIELD * Fourier Analysis of Time Sériés: An Introduction, Second Edition 

BOLLEN * Structural Equations with Latent Variables 

BOROVKOV * Ergodicity and Stability of Stochastic Processes 

BOULEAU * Numerical Methods for Stochastic Processes 

BOX • Bayesian Inference in Statistical Analysis 

BOX * R. A. Fisher, the Life of a Scientist 

BOX and DRAPER • Empirical Model-Building and Response Surfaces 
*BOX and DRAPER • Evolutionary Operation: A Statistical Method for Process 
Improvement 

BOX, HUNTER, and HUNTER ■ Statistics for Experimenters: An Introduction to 
Design, Data Analysis, and Model Building 
BOX and LUCENO * Statistical Control by Monitoring and Feedback Adjustment 
BRANDIMARTE ■ Numerical Methods in Finance: A MATLAB-Based Introduction 
BROWN and HOLLANDER • Statistics: A Biomédical Introduction 
BRUNNER, DOMHOF, and LANGER ■ Nonparametric Analysis of Longitudinal Data 
Factorial Experiments 

BUCKLEW • Large Déviation Techniques in Decision, Simulation, and Estimation 
CAIROLI and DALANG ■ Sequential Stochastic Optimization 
CHAN • Time Sériés: Applications to Finance 
CHATTERJEE and HADI • Sensitivity Analysis in Linear Régression 
CHATTERJEE and PRICE • Régression Analysis by Example, Third Edition 
CHERNICK ■ Bootstrap Methods: A Practitioner’ s Guide 
CHILÈS and DELFINER ■ Geostatistics: Modeling Spatial Uncertainty 
CHOW and LIU • Design and Analysis of Clinical Trials: Concepts and Méthodologies 
CLARKE and DISNEY • Probability and Random Processes: A First Course with 
Applications, Second Edition 

*COCHRAN and COX • Experimental Designs, Second Edition 
CONGDON • Bayesian Statistical Modelling 
CONOVER * Practical Nonparametric Statistics, Second Edition 
COOK * Régression Graphics 

COOK and WEISBERG ■ Applied Régression Including Computing and Graphics 
COOK and WEISBERG * An Introduction to Régression Graphics 
CORNELL • Experiments with Mixtures, Designs, Models, and the Analysis of Mixture 
Data, Third Edition 

COVER and THOMAS ■ Eléments of Information Theory 
COX • A Handbook of Introductory Statistical Methods 
*COX • Planning of Experiments 
CRESSIE * Statistics for Spatial Data, Revised Edition 
CSÔRGÔ and HORVÂTH ■ Limit Theorems in Change Point Analysis 
DANIEL • Applications of Statistics to Industrial Expérimentation 
DANIEL • Biostatistics: A Foundation for Analysis in the Health Sciences, Sixth Edition 
*DANIEL * Fitting Equations to Data: Computer Analysis of Multifactor Data, 

Second Edition 


*Now available in a lower priced paperback édition in the Wiley Classics Library. 



DAVID * Order Statistics, Second Edition 
*DEGROOT, FIENBERG, and KADANE ■ Statistics and the Law 
DEL CASTILLO ■ Statistical Process Adjustment for Quality Control 
DETTE and STUDDEN ■ The Theory of Canonical Moments with Applications in 
Statistics, Probability, and Analysis 
DEY and MUKERJEE ■ Fractional Factorial Plans 

DILLON and GOLDSTEIN • Multivariate Analysis: Methods and Applications 
DODGE • Alternative Methods of Régression 
*DODGE and ROMIG • Sampling Inspection Tables, Second Edition 
*DOOB • Stochastic Processes 

DOWDY and WEARDEN * Statistics for Research, Second Edition 
DRAPER and SMITH • Applied Régression Analysis, Third Edition 
DRYDEN and MARDIA ■ Statistical Shape Analysis 
DUDEWICZ and MISHRA • Modem Mathematical Statistics 

DUNN and CLARK • Applied Statistics: Analysis of Variance and Régression, Second 
Edition 

DUNN and CLARK • Basic Statistics: A Primer for the Biomédical Sciences, 

Third Edition 

DUPUIS and ELLIS • A Weak Convergence Approach to the Theory of Large Déviations 
*ELANDT- JOHNSON and JOHNSON • Survival Models and Data Analysis 
ETHIER and KURTZ • Markov Processes: Characterization and Convergence 
EVANS, HASTINGS, and PEACOCK ■ Statistical Distributions, Third Edition 
FELLER • An Introduction to Probability Theory and Its Applications, Volume I, 

Third Edition, Revised; Volume II, Second Edition 
FISHER and VAN BELLE ■ Biostatistics: A Methodology for the Health Sciences 
*FLEISS * The Design and Analysis of Clinical Experiments 
FLEISS * Statistical Methods for Rates and Proportions, Second Edition 
FLEMING and HARRINGTON ■ Counting Processes and Survival Analysis 
FULLER • Introduction to Statistical Time Sériés, Second Edition 
FULLER • Measurement Error Models 
GALLANT * Nonlinear Statistical Models 
GHOSH, MUKHOPADHYAY, and SEN • Sequential Estimation 
GIFI • Nonlinear Multivariate Analysis 

GLASSERMAN and YAO ■ Monotone Stmcture in Discrete-Event Systems 
GNANADESIKAN ■ Methods for Statistical Data Analysis of Multivariate Observations, 
Second Edition 

GOLDSTEIN and LEWIS • Assessment: Problems, Development, and Statistical Issues 
GREENWOOD and NIKULIN • A Guide to Chi-Squared Testing 
GROSS and HARRIS ■ Fundamentals of Queueing Theory, Third Edition 
*HAHN • Statistical Models in Engineering 

HAHN and MEEKER * Statistical Intervals: A Guide for Practitioners 
HALD • A History of Probability and Statistics and their Applications Before 1750 
HALD ■ A History of Mathematical Statistics from 1750 to 1930 
HAMPEL * Robust Statistics: The Approach Based on Influence Functions 
HANNAN and DEISTLER ■ The Statistical Theory of Linear Systems 
HEIBERGER • Computation for the Analysis of Designed Experiments 
HEDAYAT and SINHA • Design and Inference in Finite Population Sampling 
HELLER • MACSYMA for Statisticians 

HINKELMAN and KEMPTHORNE: * Design and Analysis of Experiments, Volume 1: 
Introduction to Experimental Design 

HOAGLIN, MOSTELLER, and TUKEY ■ Exploratory Approach to Analysis 
of Variance 

HOAGLIN, MOSTELLER, and TUKEY ■ Exploring Data Tables, Trends and Shapes 


*Now available in a lower priced paperback édition in the Wiley Classics Library. 



*HOAGLIN, MOSTELLER, and TUKEY * Understanding Robust and Exploratory 
Data Analysis 

HOCHBERG and TAMHANE ■ Multiple Comparison Procedures 
HOCKING • Methods and Applications of Linear Models: Régression and the Analysis 
of Variables 

HOEL • Introduction to Mathematical Statistics, Fifth Edition 
HOGG and KLUGMAN ■ Loss Distributions 

HOLLANDER and WOLFE * Nonparametric Statistical Methods, Second Edition 
HOSMER and LEMESHOW • Applied Logistic Régression, Second Edition 
HOSMER and LEMESHOW • Applied Survival Analysis: Régression Modeling of 
Time to Event Data 

H0YLAND and RAUSAND ■ System Reliability Theory: Models and Statistical Methods 

HUBER ■ Robust Statistics 

HUBERTY * Applied Discriminant Analysis 

HUNT and KENNEDY • Financial Dérivatives in Theory and Practice 
HUSKOVA, BERAN, and DUPAC • Collected Works of Jaroslav Hajek— 
with Commentary 

IMAN and CONOVER ■ A Modem Approach to Statistics 
JACKSON * A User’s Guide to Principle Components 
JOHN • Statistical Methods in Engineering and Quality Assurance 
JOHNSON * Multivariate Statistical Simulation 

JOHNSON and BALAKRISHNAN • Advances in the Theory and Practice of Statistics: A 
Volume in Honor of Samuel Kotz 

JUDGE, GRIFFITHS, HILL, LÜTKEPOHL, and LEE ■ The Theory and Practice of 
Econometrics, Second Edition 
JOHNSON and KOTZ ■ Distributions in Statistics 

JOHNSON and KOTZ (editors) • Leading Personalities in Statistical Sciences: From the 
Seventeenth Century to the Présent 

JOHNSON, KOTZ, and BALAKRISHNAN • Continuons Univariate Distributions, 
Volume Second Edition 

JOHNSON, KOTZ, and BALAKRISHNAN • Continuons Univariate Distributions, 
Volume 2, Second Edition 

JOHNSON, KOTZ, and BALAKRISHNAN • Discrète Multivariate Distributions 
JOHNSON, KOTZ, and KEMP • Univariate Discrète Distributions, Second Edition 

V , 

JURECKOVA and SEN * Robust Statistical Procedures: Aymptotics and Interrelations 
JUREK and MASON • Operator-Limit Distributions in Probability Theory 
KADANE • Bayesian Methods and Ethics in a Clinical Trial Design 
KADANE AND SCHUM * A Probabilistic Analysis of the Sacco and Vanzetti Evidence 
KALBFLEISCH and PRENTICE • The Statistical Analysis of Failure Time Data, Second 
Edition 

KASS and VOS * Geometrical Foundations of Asymptotic Inference 
KAUFMAN and ROUSSEEUW * Finding Groups in Data: An Introduction to Cluster 
Analysis 

KEDEM and FOKIANOS * Régression Models for Time Sériés Analysis 
KENDALL, BARDEN, CARNE, and LE ■ Shape and Shape Theory 
KHURI * Advanced Calculas with Applications in Statistics, Second Edition 
KHURI, MATHEW, and SINHA ■ Statistical Tests for Mixed Linear Models 
KLUGMAN, PANJER, and WILLMOT • Loss Models: From Data to Decisions 
KLUGMAN, PANJER, and WILLMOT • Solutions Manual to Accompany Loss Models: 
From Data to Decisions 

KOTZ, BALAKRISHNAN, and JOHNSON • Continuons Multivariate Distributions, 
Volume Second Edition 

KOTZ and JOHNSON (editors) * Encyclopedia of Statistical Sciences: Volumes 1 to 9 
with Index 


*Now available in a lower priced paperback édition in the Wiley Classics Library. 



KOTZ and JOHNSON (editors) * Encyclopedia of Statistical Sciences: Supplément 
Volume 

KOTZ, READ, and BANKS (editors) * Encyclopedia of Statistical Sciences: Update 
Volume 1 

KOTZ, READ, and BANKS (editors) * Encyclopedia of Statistical Sciences: Update 
Volume 2 

KOVALENKO, KUZNETZOV, and PEGG • Mathematical Theory of Reliability of 
Time-Dependent Systems with Practical Applications 
LACHIN • Biostatistical Methods: The Assessment of Relative Risks 
LAD ■ Operational Subjective Statistical Methods: A Mathematical, Philosophical, and 
Historical Introduction 

LAMPERTI * Probability: A Survey of the Mathematical Theory, Second Edition 
LANGE, RYAN, BILLARD, BRILLINGER, CONQUEST, and GREENHOUSE ■ 
Case Studies in Biometry 

LARSON * Introduction to Probability Theory and Statistical Inference, Third Edition 
LAWLESS ■ Statistical Models and Methods for Lifetime Data 
LAWSON • Statistical Methods in Spatial Epidemiology 
LE ■ Applied Categorical Data Analysis 
LE • Applied Survival Analysis 

LEE and WANG • Statistical Methods for Survival Data Analysis, Third Edition 

LePAGE and BILLARD • Exploring the Limits of Bootstrap 

LEYLAND and GOLDSTEIN (editors) * Multilevel Modelling of Health Statistics 

LIAO ■ Statistical Group Comparison 

LINDVALL ■ Lectures on the Coupling Method 

LINHART and ZUCCHINI • Model Sélection 

LITTLE and RUBIN * Statistical Analysis with Missing Data, Second Edition 
LLOYD ■ The Statistical Analysis of Categorical Data 

MAGNUS and NEUDECKER * Matrix Differential Calculus with Applications in 
Statistics and Econometrics, Revised Edition 
MALLER and ZHOU • Survival Analysis with Long Term Survivors 
MALLOWS * Design, Data, and Analysis by Some Friends of Cuthbert Daniel 
MANN, SCHAFER, and SINGPURWALLA ■ Methods for Statistical Analysis of 
Reliability and Life Data 

MANTON, WOODBURY, and TOLLEY ■ Statistical Applications Using Fuzzy Sets 
MARDIA and JUPP * Directional Statistics 

MASON, GUNST, and HESS ■ Statistical Design and Analysis of Experiments with 
Applications to Engineering and Science 
McCULLOCH and SEARLE ■ Generalized, Linear, and Mixed Models 
McFadden ■ Management of Data in Clinical Trials 
McLACHLAN • Discriminant Analysis and Statistical Pattern Récognition 
McLACHLAN and KRISHNAN * The EM Algorithm and Extensions 
McLACHLAN and PEEL ■ Finite Mixture Models 
McNEIL • Epidemiological Research Methods 
MEEKER and ESCOBAR * Statistical Methods for Reliability Data 
MEERSCHAERT and SCHEFFLER * Limit Distributions for Sums of Independent 
Random Vectors: Heavy Tails in Theory and Practice 
*MILLER • Survival Analysis, Second Edition 

MONTGOMERY, PECK, and VINING * Introduction to Linear Régression Analysis, 
Third Edition 

MORGENTHALER and TUKEY ■ Configurai Polysampling: A Route to Practical 
Robustness 

MUIRHEAD ■ Aspects of Multivariate Statistical Theory 
MURRAY • X-STAT 2.0 Statistical Expérimentation, Design Data Analysis, and 
Nonlinear Optimization 


*Now available in a lower priced paperback édition in the Wiley Classics Library. 



MYERS and MONTGOMERY • Response Surface Methodology: Process and Product 
Optimization Using Designed Experiments, Second Edition 
MYERS, MONTGOMERY, and VINING • Generalized Linear Models. With 
Applications in Engineering and the Sciences 
NELSON ■ Accelerated Testing, Statistical Models, Test Plans, and Data Analyses 
NELSON ■ Applied Life Data Analysis 
NEWMAN * Biostatistical Methods in Epidemiology 

OCHI • Applied Probability and Stochastic Processes in Engineering and Physical 
Sciences 

OKABE, BOOTS, SUGIHARA, and CHIU ■ Spatial Tesselations: Concepts and 
Applications of Voronoi Diagrams, Second Edition 
OLIVER and SMITH • Influence Diagrams, Belief Nets and Decision Analysis 
PANKRATZ • Forecasting with Dynamic Régression Models 
PANKRATZ • Forecasting with Univariate Box-Jenkins Models: Concepts and Cases 
*PARZEN * Modem Probability Theory and Its Applications 
PENA, TIAO, and TSAY ■ A Course in Time Sériés Analysis 
PIANTADOSI ■ Clinical Trials; A Méthodologie Perspective 
PORT • Theoretical Probability for Applications 

POURAHMADI • Foundations of Time Sériés Analysis and Prédiction Theory 
PRESS ■ Bayesian Statistics: Principles, Models, and Applications 
PRESS and TANUR • The Subjectivity of Scientists and the Bayesian Approach 
PUKELSHEIM * Optimal Experimental Design 

PURI, VILAPLANA, and WERTZ ■ New Perspectives in Theoretical and Applied 
Statistics 

PUTERMAN • Markov Decision Processes: Discrète Stochastic Dynamic Programming 
*RAO • Linear Statistical Inference and Its Applications, Second Edition 
RENCHER ■ Linear Models in Statistics 
RENCHER * Methods of Multivariate Analysis, Second Edition 
RENCHER * Multivariate Statistical Inference with Applications 
RIPLEY ■ Spatial Statistics 
RIPLEY ■ Stochastic Simulation 
ROBINSON • Practical Strategies for Experimenting 

ROHATGI and SALEH • An Introduction to Probability and Statistics, Second Edition 
ROLSKI, SCHMIDLI, SCHMIDT, and TEUGELS ■ Stochastic Processes for Insurance 
and Finance 

ROSENBERGER and LACHIN ■ Randomization in Clinical Trials: Theory and Practice 

ROSS • Introduction to Probability and Statistics for Engineers and Scientists 

ROUSSEEUW and LEROY * Robust Régression and Outlier Détection 

RUBIN * Multiple Imputation for Nonresponse in Surveys 

RUBINSTEIN • Simulation and the Monte Carlo Method 

RUBINSTEIN and MELAMED ■ Modem Simulation and Modeling 

RYAN • Modem Régression Methods 

RYAN • Statistical Methods for Quality Improvement, Second Edition 
SALTELLI, CHAN, and SCOTT (editors) • Sensitivity Analysis 
*SCHEFFE ■ The Analysis of Variance 

SCHIMEK * Smoothing and Régression: Approaches, Computation, and Application 
SCHOTT ■ Matrix Analysis for Statistics 

SCHUSS * Theory and Applications of Stochastic Differential Equations 
SCOTT * Multivariate Density Estimation: Theory, Practice, and Visualization 
*SEARLE * Linear Models 
SEARLE • Linear Models for Unbalanced Data 
SEARLE * Matrix Algebra Useful for Statistics 
SEARLE, CASELLA, and MeCULLOCH ■ Variance Components 
SEARLE and WILLETT • Matrix Algebra for Applied Economies 
SEBER * Linear Régression Analysis 


*Now available in a lower priced paperback édition in the Wiley Classics Library. 



SEBER • Multivariate Observations 
SEBER and WILD • Nonlinear Régression 

SENNOTT • Stochastic Dynamic Programming and the Control of Queueing Systems 
*SERFLING • Approximation Theorems of Mathematical Statistics 
SHAFER and VOVK • Probability and Finance: Ifs Only a Game! 

SMALL and McLEISH ■ Hilbert Space Methods in Probability and Statistical Inference 

SRIVASTAVA ■ Methods of Multivariate Statistics 

STAPLETON ■ Linear Statistical Models 

STAUDTE and SHEATHER ■ Robust Estimation and Testing 

STOYAN, KENDALL, and MECKE * Stochastic Geometry and Its Applications, Second 
Edition 

STOYAN and STOYAN ■ Fractals, Random Shapes and Point Fields: Methods of 
Geometrical Statistics 

STYAN ■ The Collected Papers of T. W. Anderson; 1943-1985 
SUTTON, ABRAMS, JONES, SHELDON, and SONG • Methods for Meta-Analysis in 
Medical Research 

TANAKA * Time Sériés Analysis: Nonstationary and Noninvertible Distribution Theory 

THOMPSON * Empirical Model Building 

THOMPSON * Sampling, Second Edition 

THOMPSON * Simulation: A Modeler’s Approach 

THOMPSON and SEBER ■ Adaptive Sampling 

THOMPSON, WILLIAMS, and FINDLAY • Models for Investors in Real World Markets 
TIAO, BISGAARD, HILL, PENA, and STIGLER (editors) ■ Box on Quality and 
Discovery: with Design, Control, and Robustness 
TIERNEY • LISP-STAT: An Object-Oriented Environment for Statistical Computing 
and Dynamic Graphics 
TSAY • Analysis of Financial Time Sériés 

UPTON and FINGLETON ■ Spatial Data Analysis by Example, Volume II: 

Categorical and Directional Data 
VAN BELLE * Statistical Rules of Thumb 
VIDAKOVIC * Statistical Modeling by Wavelets 
WEISBERG • Applied Linear Régression, Second Edition 
WELSH • Aspects of Statistical Inference 

WESTFALL and YOUNG * Resampling-Based Multiple Testing: Examples and 
Methods for p-Value Adjustment 

WHITTAKER ■ Graphical Models in Applied Multivariate Statistics 
WINKER • Optimization Heuristics in Economies: Applications of Threshold Accepting 
WONNACOTT and WONNACOTT ■ Econometrics, Second Edition 
WOODING • Planning Pharmaceutical Clinical Trials: Basic Statistical Principles 
WOOLSON and CLARKE ■ Statistical Methods for the Analysis of Biomédical Data, 
Second Edition 

WU and HAMADA • Experiments: Planning, Analysis, and Parameter Design 
Optimization 

YANG • The Construction Theory of Denumerable Markov Processes 
*ZELLNER • An Introduction to Bayesian Inference in Econometrics 
ZHOU, OBUCHOWSKI, and MeCLISH ■ Statistical Methods in Diagnostic Medicine 


*Now available in a lower priced paperback édition in the Wiley Classics Library. 



